If you're a large model developer seeking Chitu compatibility for your model:
- Submit a Pull Request - our team will review and merge after confirmation (see CONTRIBUTING)
- For technical difficulties, contact our support team at solution@chitu.ai
If you're developing or using an unsupported chip architecture:
- Submit a Pull Request for review (see CONTRIBUTING)
- For adaptation challenges, email solution@chitu.ai
Solution: Store weights in FP8 format but execute computations in BF16 (similar to w8a16 quantization where "8" refers to float8).
Note: Floating-point conversion involves greater technical complexity than integer conversion. Technical details are explained in this Zhihu article.
While typically improving cost-performance ratios rather than raw performance, exceptional cases may show both compute savings and speedups. This Zhihu analysis explains when such exceptional cases occur.
Chitu complements rather than replicates existing solutions by focusing on:
- Native support for non-nvidia chips (e.g., Ascend/Muxi/Hygon)
- Seamless scalability from minimal to large-scale deployments
Consider Chitu if you:
- Use non-nvidia chips (Ascend/Muxi/Hygon/etc.)
- Employ heterogeneous computing (mixed chips)
- Require high-performance inference
- Seek cost-efficient deployment
- Engaged in research on inference framework
Since v0.2.2: Supports CPU+GPU heterogeneous inference
CPU-only support: Planned feature