Conversation
…s. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
…rors still, so wip
|
ok but it sounds like it's unrelated to this PR or running on mac. the only thing that is mostly preventing me from merging this branch to master is the toml issue i think. looking... |
Add mps and cpu dependency management
|
@burtenshaw thanks but i get an error with this (doing uv sync on my macbook) I think it's time I spend some quality time with uv docs. |
@karpathy Sorry! Then it could just be as simple as limiting the cuda index to only linux: |
|
Still should be checked with gpu and cpu configs to see if that throws errors |
| @@ -44,6 +44,6 @@ def document_batches(): | |||
| inputs_cpu = scratch[:-1].to(dtype=torch.int32) | |||
There was a problem hiding this comment.
FYI: I'm getting the following error at L42
!self.is_mps() INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp":1414, please report a bug to PyTorch. as_strided_tensorimpl does not work with MPS; call self.as_strided(...) instead
There was a problem hiding this comment.
To fix it, change L41-42 to
tokens = [token_buffer.popleft() for _ in range(needed_tokens)]
scratch = torch.tensor(tokens, dtype=torch.int64, pin_memory=device.type == "cuda")
|
@karpathy - I successfully got this running on a Windows machine (CPU-only). I encountered a few issues during setup, which I have detailed below along with their resolutions. Getting the nanochat-d32 model running on Windows CPU wasn't straightforward, but the journey was incredibly educational..! Issues Encountered:
Attaching a screenshot of the nanochat application running:
|
yes, this is working on Mac OS with M2 |
…various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
@karpathy Instead of assuming GPU support for Linux users, a cleaner way to do it would be to make the device specific package index for Torch an "extra", switching to the extra like my PR showed. Adds index: If the devices are all autodetected indexes, it could "just work". |
… flagship build which is linux. sorry to pollute the repo history...


WIP, allowing people to run the code either on CPU (any potato) or MPS (Macbook GPUs)
Atm struggling a bit to figure out how to adjust the
pyproject.tomlto switch pytorch to the basic version on demand. Current workaround is to delete these lines from the pyproject.toml: