GIANT/ contains the versioned iterations of the main language model codebase.
v0andv1are really old and not recommended for use. They serve as good learning resource because are simpler and mimic the early days of GPT-1/2/3v2is the stable base and the current reference implementationv3builds on top ofv2and is the active branch for training-system improvements like multi-gpu training and more complex additions to the transformer like MLA.
The next big steps planned for v3 are:
- More advanced sharding beyond plain data parallelism
- Better partitioning of model and optimizer state
- More mature distributed execution strategies
- MLA-related experiments once the rest of the training stack is stable enough
If you want the stable path, start with GIANT/v2.
If you want the newest multi-GPU work, look at GIANT/v3.