AttributeError: 'NoneType' object has no attribute 'get' when running torchrun

I encountered an error when running torchrun command on my system with the following traceback:

```
Traceback (most recent call last):
  File "/mnt/f/projects/python/git/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 237, in launch_agent
    result = agent.run()
             ^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run
    result = self._invoke_run(role)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 844, in _invoke_run
    self._initialize_workers(self._worker_group)
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 681, in _initialize_workers
    worker_ids = self._start_workers(worker_group)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 271, in _start_workers
    self._pcontext = start_processes(
                     ^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/__init__.py", line 207, in start_processes
    redirs = to_map(redirects, nprocs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/f/projects/python/git/llama/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 162, in to_map
    map[i] = val_or_map.get(i, Std.NONE)
             ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'
```


I am using torchrun with --nproc_per_node 1 option and passing the example.py script as an argument. I also provided the --ckpt_dir and --tokenizer_path arguments to the script. I have downloaded the 7B files and verified the checksum, and $TARGET_FOLDER has been set. I am not sure what caused this error and how to resolve it.

Here is the command I ran:
```
$ torchrun --nproc_per_node 1 example.py --ckpt_dir $TARGET_FOLDER/7B --tokenizer_path $TARGET_FOLDER/tokenizer.model
```
Can you please help me diagnose the issue and find a solution? Thank you.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NoneType' object has no attribute 'get' when running torchrun #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AttributeError: 'NoneType' object has no attribute 'get' when running torchrun #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions