Skip to content

Commit a15df1d

Browse files
authored
Add five example for training/inference speedup
Add five example for training/inference speedup
2 parents 5e9cfa4 + d3c78e4 commit a15df1d

File tree

1,248 files changed

+155879
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,248 files changed

+155879
-0
lines changed

examples/FR-Spec/.gitignore

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# poetry
98+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102+
#poetry.lock
103+
104+
# pdm
105+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106+
#pdm.lock
107+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108+
# in version control.
109+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
110+
.pdm.toml
111+
.pdm-python
112+
.pdm-build/
113+
114+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
115+
__pypackages__/
116+
117+
# Celery stuff
118+
celerybeat-schedule
119+
celerybeat.pid
120+
121+
# SageMath parsed files
122+
*.sage.py
123+
124+
# Environments
125+
.env
126+
.venv
127+
env/
128+
venv/
129+
ENV/
130+
env.bak/
131+
venv.bak/
132+
133+
# Spyder project settings
134+
.spyderproject
135+
.spyproject
136+
137+
# Rope project settings
138+
.ropeproject
139+
140+
# mkdocs documentation
141+
/site
142+
143+
# mypy
144+
.mypy_cache/
145+
.dmypy.json
146+
dmypy.json
147+
148+
# Pyre type checker
149+
.pyre/
150+
151+
# pytype static type analyzer
152+
.pytype/
153+
154+
# Cython debug symbols
155+
cython_debug/
156+
157+
# PyCharm
158+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
159+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
160+
# and can be added to the global gitignore or merged into this file. For a more nuclear
161+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162+
#.idea/
163+
164+
165+
build/
166+
# Cuda
167+
*.i
168+
*.ii
169+
*.gpu
170+
*.ptx
171+
*.cubin
172+
*.fatbin
173+
174+
# Prerequisites
175+
*.d
176+
177+
# Compiled Object files
178+
*.slo
179+
*.lo
180+
*.o
181+
*.obj
182+
183+
# Precompiled Headers
184+
*.gch
185+
*.pch
186+
187+
# Compiled Dynamic libraries
188+
*.so
189+
*.dylib
190+
*.dll
191+
192+
# Fortran module files
193+
*.mod
194+
*.smod
195+
196+
# Compiled Static libraries
197+
*.lai
198+
*.la
199+
*.a
200+
*.lib
201+
202+
# Executables
203+
*.exe
204+
*.out
205+
*.app
206+
207+
.DS_Store
208+
.vscode
209+
210+
# Checkpoints
211+
checkpoints/*
212+
!checkpoints/.gitkeep
213+
214+
tests/*
215+
216+
models/*
217+
!models/.gitkeep
218+

examples/FR-Spec/.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "src/cutlass"]
2+
path = src/cutlass
3+
url = https://github.com/NVIDIA/cutlass

examples/FR-Spec/README.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# FR-Spec: Frequency-Ranked Speculative Sampling
2+
3+
[![arXiv](https://img.shields.io/badge/arXiv-2502.14856-b31b1b.svg)](https://arxiv.org/abs/2502.14856) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
4+
5+
## Introduction
6+
7+
> This is the C/CUDA implementation for FR-Spec
8+
9+
Surprisingly, EAGLE-2's bottleneck is LM-Head.
10+
11+
Leveraging the 'long-tail' property of token distribution, we achieve a **1.12x speedup** over EAGLE-2.
12+
13+
Our method is simple to implement, preserves generation quality, and requires no retraining.
14+
15+
👉 **[Read our paper](https://arxiv.org/abs/2502.14856)**
16+
17+
## Decoding Speed
18+
19+
<div align="center">
20+
<img src="assets/speed_compare.png" alt="FR-Spec Architecture" width="800px">
21+
</div>
22+
Decoding speed (token/s) of FR-Spec and EAGLE-2 for Llama3-8B and Llama3.2-1B under different frameworks.
23+
24+
## News
25+
26+
**2025.05.29** Our subsequent work: a systematic analysis of Speculative + Quantization: ([paper](https://arxiv.org/pdf/2505.22179)).
27+
28+
**2025.05.15** Accepted. (ACL 2025 main)
29+
30+
**2025.03.03** Feature merged to SGLang ([link](https://docs.sglang.ai/backend/speculative_decoding.html#EAGLE-Decoding-via-Frequency-Ranked-Speculative-Sampling)).
31+
32+
**2025.03.01** Implementation framework released.
33+
34+
**2025.02.26** Token-frequency statistics released.
35+
36+
## Installation from source
37+
38+
```bash
39+
conda create -n fr-spec python==3.11 && conda activate fr-spec
40+
# install pytorch for your platform, see https://pytorch.org
41+
git clone https://github.com/thunlp/FR-Spec.git --recursive && cd FR-Spec
42+
vim setup.py # change arch="80" to other code for your platform, see https://developer.nvidia.com/cuda-gpus#compute
43+
pip install .
44+
```
45+
46+
## Evaluation
47+
48+
### Model Weights
49+
50+
Download the corresponding model weights and save them in the `models` folder.
51+
52+
### Prepare Fr-Spec vocabulary subset
53+
54+
You can download our processed token-frequency statistics:
55+
56+
- [LLaMA3-Instruct-8B-FR-Spec](https://huggingface.co/thunlp/LLaMA3-Instruct-8B-FR-Spec)
57+
- [LLaMA3.2-Instruct-1B-FR-Spec](https://huggingface.co/thunlp/LLaMA3.2-Instruct-1B-FR-Spec)
58+
59+
Or you can also get your token-frequency statistics based on our script:
60+
61+
```bash
62+
cd fr
63+
python fr.py --model_name <model_name> --model_path <model_path> --num_lines <num_lines> --vocab_size <vocab_size>
64+
```
65+
- `model_name`: The name of the model (e.g.`llama3-8b-instruct`).
66+
- `model_path`: The path to the model (e.g. `meta-llama/Meta-Llama-3-8B-Instruct`).
67+
- `num_lines`: Number of lines to process from the SlimPajama dataset. Defaults to `1000000`.
68+
- `vocab_size`: A list of vocabulary sizes to process. Each size represents a subset of the most frequent tokens to keep. Default values are `[8192, 16384, 32768, 65536]`.
69+
70+
An example command for generating token frequency statistics from 1 million lines of the SlimPajama dataset for the Llama-3-8B-Instruct model:
71+
```bash
72+
python fr.py --model_name llama3-8b-instruct --model_path meta-llama/Meta-Llama-3-8B-Instruct --num_lines 1000000 --vocab_size <vocab_size>
73+
```
74+
75+
The script analyzes token frequency distribution across `num_lines` of the SlimPajama corpus and saves the most frequent tokens (as specified by `vocab_size`) to the corresponding directory in `fr-index`. Copy the generated token-frequency files to the corresponding FR-Spec model folder to enable their use in your experiments.
76+
77+
**🌟Welcome:** We encourage you to upload your processed vocabulary for different models to HuggingFace (model name suffixed with ​FR-Spec).
78+
79+
### Run Evaluation
80+
81+
All scripts for evaluation are located in the `scripts` folder. Here we use Llama-3-8B-Instruct as an example:
82+
83+
```bash
84+
# 1. Run evaluations
85+
bash scripts/<benchmark>/llama3-8b-instruct/run_baseline.sh
86+
bash scripts/<benchmark>/llama3-8b-instruct/run_eagle.sh
87+
bash scripts/<benchmark>/llama3-8b-instruct/run_eagle_fr_spec.sh
88+
89+
# 2. Evaluate speed
90+
bash scripts/<benchmark>/llama3-8b-instruct/speed_up.sh
91+
92+
# 3. Check correctness (for human_eval and gsm8k only)
93+
bash scripts/<benchmark>/llama3-8b-instruct/check_correctness.sh
94+
```
95+
96+
Replace `<benchmark>` with one of: `spec_bench`, `human_eval`, or `gsm8k`.
97+
98+
## Contributors
99+
100+
- [Weilin Zhao](https://github.com/Achazwl)
101+
102+
- [Yudi Zhang](https://github.com/YudiZh)
103+
104+
- [Tengyu Pan](https://github.com/ThonyPan)
105+
106+
## Acknowledgment
107+
108+
Our experiments are based on https://github.com/SafeAILab/EAGLE and https://github.com/FasterDecoding/Medusa.
109+
110+
The `evaluation/` folder is modified base on https://github.com/hemingkx/Spec-Bench.
111+
112+
The `src/flash_attn/` folder is modified base on https://github.com/Dao-AILab/flash-attention/blob/v2.4.2/csrc/flash_attn.
113+
114+
## Citation
115+
116+
```
117+
@article{zhao2025fr,
118+
title={FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling},
119+
author={Zhao, Weilin and Pan, Tengyu and Han, Xu and Zhang, Yudi and Sun, Ao and Huang, Yuxiang and Zhang, Kaihuo and Zhao, Weilun and Li, Yuxuan and Wang, Jianyong and others},
120+
journal={arXiv preprint arXiv:2502.14856},
121+
year={2025}
122+
}
123+
```
826 KB
Loading
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)