Skip to content

Utils script to create users on all nodes (login, controller and compute), run from any node#1061

Open
mayankgupta14 wants to merge 1 commit intomainfrom
mayankpg
Open

Utils script to create users on all nodes (login, controller and compute), run from any node#1061
mayankgupta14 wants to merge 1 commit intomainfrom
mayankpg

Conversation

@mayankgupta14
Copy link
Copy Markdown
Collaborator

Adding the utils script to create users on all nodes (login, controller and compute). This is in addition to add_users.sh which runs on provisioning, this script can be executed from any node and at any time and will detect existing users and UID and create users and then update the shared_users.txt and optionally upload to S3 also.

Purpose

Changes

Adding the utils script to create users on all nodes (login, controller and compute). This is in addition to add_users.sh which runs on provisioning, this script can be executed from any node and at any time and will detect existing users and UID and create users and then update the shared_users.txt and optionally upload to S3 also.

Test Plan

Environment:

  • AWS Service:
  • Instance type:
  • Number of nodes:

Test commands:

Test Results

Directory Structure

3.test_cases/
└── <framework>/                # e.g. pytorch, megatron, jax
    └── <library>/              # e.g. picotron, FSDP, megatron-lm
        └── <model>/            # e.g. SmolLM-1.7B (may be omitted for single-model cases)
            ├── Dockerfile      # Container / environment setup
            ├── README.md       # Overview, prerequisites, usage
            ├── slurm/          # Slurm-specific launch scripts
            ├── kubernetes/     # Kubernetes manifests
            └── hyperpod-eks/   # HyperPod EKS instructions
  • Top-level files (Dockerfile, README.md, training scripts, configs) cover general setup.
  • Subdirectories (slurm/, kubernetes/, hyperpod-eks/) contain service-specific launch instructions.
  • Not all service subdirectories are required — include only the ones relevant to your test case.

Checklist

  • I have read the contributing guidelines.
  • I am working against the latest main branch.
  • I have searched existing open and recently merged PRs to confirm this is not a duplicate.
  • The contribution is self-contained with documentation and scripts.
  • External dependencies are pinned to a specific version or tag (no latest).
  • A README is included or updated with prerequisites, instructions, and known issues.
  • New test cases follow the expected directory structure.

…er and compute). This is in addition to add_users.sh which runs on provisioning, this script can be executed from any node and at any time and will detect existing users and UID and create users and then update the shared_users.txt and optionally upload to S3 also.
@mayankgupta14 mayankgupta14 requested a review from a team April 8, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant