用户实测 Evaluation by Users #104
Replies: 46 comments 1 reply
-
|
GPU: NVIDIA H20 chitu version: v0.4.2 Model: Qwen3-32B
|
Beta Was this translation helpful? Give feedback.
-
Model: Qwen3-32B
|
Beta Was this translation helpful? Give feedback.
-
Others: |
Beta Was this translation helpful? Give feedback.
-
|
Benchmark configuration
Result set 1
Result set 2
P.S.: Results with batch size >=8 are currently unavailable due to some device issues. Maybe update them later. |
Beta Was this translation helpful? Give feedback.
-
|
Environment Config:
Inference Config:
Benchmark Config:
Results:
|
Beta Was this translation helpful? Give feedback.
-
|
Device: NVIDIA H20
|
Beta Was this translation helpful? Give feedback.
-
Performance Test Report for gpt-oss-20b-BF16 on Hygon DCU1. Test Environment
2. Test Configuration
3. Performance DataThe table below shows the model's performance metrics at different batch sizes.
4. Problem DescriptionWhen running the test with a Batch Size of 32, the program encountered a bug and terminated, reporting the following error: IndexError: list index out of range |
Beta Was this translation helpful? Give feedback.
-
|
Device: NVIDIA L40S x 4
|
Beta Was this translation helpful? Give feedback.
-
3.Benchmark Config:
4.Results:
|
Beta Was this translation helpful? Give feedback.
-
System and Hardware Configuration
Software and Frameworks
Startup CommandRuntime ErrorExplanationMy GPUs are Tesla V100 (compute capability 7.0, sm_70), which do not support BF16 instructions. |
Beta Was this translation helpful? Give feedback.
-
|
Environment Config:
Inference Config:
Benchmark Config:
Result:
|
Beta Was this translation helpful? Give feedback.
-
Qwen3-0.6B Performance Benchmark on RTX 5090Environment
Benchmark Configuration
Results
|
Beta Was this translation helpful? Give feedback.
-
Qwen3-8B Performance Benchmark on RTX 3090Environment
Benchmark Configuration
Results
|
Beta Was this translation helpful? Give feedback.
-
Qwen3-8B Serving Benchmark on RTX 4090Environment
Benchmark Configuration
Launch Commandtorchrun --nnodes 1 \
--nproc_per_node 1 \
--master_port=22525 \
-m chitu \
serve.port=21002 \
infer.cache_type=paged \
infer.pp_size=1 \
infer.tp_size=1 \
models=Qwen3-8B \
models.ckpt_dir=/root/autodl-tmp/model/Qwen3-8B \
infer.mla_absorb=absorb-without-precomp \
infer.raise_lower_bit_float_to=bfloat16 \
infer.max_reqs=4 \
infer.max_seq_len=1024 \
request.max_new_tokens=100 \
infer.use_cuda_graph=TrueResults
Time to First Token (TTFT)
Time per Output Token (TPOT, excl. 1st)
Inter-token Latency (ITL)
|
Beta Was this translation helpful? Give feedback.
-
|
测试环境信息
|
Beta Was this translation helpful? Give feedback.
-
Device InformationHardware
2.2.2 Software
Server Configexport WORLD_SIZE=1
torchrun --nnodes 1 \
--nproc_per_node 1 \
--master_port=22525 -m chitu serve.port=21002 \
infer.cache_type=paged \
infer.pp_size=1 \
infer.tp_size=1 \
models=Qwen3-0.6B \
models.ckpt_dir=/workspace/qwen3 \
infer.max_reqs=16 \
infer.max_seq_len=4096 \
request.max_new_tokens=100Benchmark Configiteration=10 input_len=128 output_len=1024 warmup=3 Benchmark Summary
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Qwen3-32B Performance Benchmark on Ascend 910B3Environment
Benchmark Configuration
Results
|
Beta Was this translation helpful? Give feedback.
-
Benchmark - Qwen3-8B on Nvidia A100 40GBEnvironment
Benchmark Configuration
|
Beta Was this translation helpful? Give feedback.
-
[Evaluation] Qwen2.5-7B-Instruct Performance Benchmark on NVIDIA RTX 5090Environment
Model: Qwen2.5-7B-Instruct
bs | TPS | TTFT | TPOT | Total Token throughput
-- | -- | -- | -- | --
1 | 99.19 | 65.29 | 9.97 | 204.00
4 | 370.48 | 172.79 | 10.47 | 761.94
8 | 706.03 | 348.99 | 10.66 | 1452.05
16 | 1225.32 | 544.29 | 12.00 | 2520.05
32 | 2057.94 | 891.02 | 13.80 | 4232.44
64 | 2069.09 | 1521.45 | 21.44 | 4255.37
|
Beta Was this translation helpful? Give feedback.
-
|
Here are the benchmark results for Qwen3-0.6B running on Chitu. Environment
Benchmark Configuration
Results
|
Beta Was this translation helpful? Give feedback.
-
Performance Test Report for Qwen3-0.6B-BF16 on Muxi GPU1. Test Environment
2. Test Command3. Performance DataThis table includes the real-time metrics captured by the Chitu throughput monitor during the stable execution phase (Iter 3).
4. Problem DescriptionTested on a 16GB VRAM partition. Using the default
|
Beta Was this translation helpful? Give feedback.
-
|
cuda 12.4 显卡名称 NVIDIA A40 ┌──────────────────────────────┬───────┬───────────────────┬───────────────────┬───────────────────┬───────────────────┬───────────────────┬───────────────────┬───────────────────┬────┐ ┌──────────────────────────────┬───────┬───────────────────┐ |
Beta Was this translation helpful? Give feedback.
-
Performance Test Report for Qwen3-0.6B-on RTX-4090Environment
Server Configtorchrun --nnodes 1 \
--nproc_per_node 4 \
--master_port=22525 \
-m chitu \
serve.port=21002 \
infer.cache_type=paged \
infer.pp_size=1 \
infer.tp_size=4\
models=Qwen3-0.6B \
models.ckpt_dir=/data3/qsy/models/qwen3-0.6b/ \
infer.mla_absorb=absorb-without-precomp \
infer.raise_lower_bit_float_to=bfloat16 \
infer.max_reqs=4 \
infer.max_seq_len=4096 \
request.max_new_tokens=100 \
infer.use_cuda_graph=TrueResult
|
Beta Was this translation helpful? Give feedback.
-
Qwen3-32B Performance Benchmark on Ascend 910B2Environment
Benchmark Configuration
Results
|
Beta Was this translation helpful? Give feedback.
-
Chitu Engine Performance Evaluation Report1. Hardware and Software Configuration
2. Chitu Version
3. Evaluation Methodology
4. Performance and Accuracy Data [Performance Data]
[Accuracy Data]
|
Beta Was this translation helpful? Give feedback.
-
|
Hardware and Software Configuration: Evaluation Methodology: 📝 中文说明
|
Beta Was this translation helpful? Give feedback.
-
the Performance of Qwen2.5-7B-Instruct Model in
|
| Metrics | Successful requests | Benchmark duration (s) | Total input tokens | Total generated tokens | Request throughput (req/s) | Output token throughput (tok/s) | Total Token throughput (tok/s) |
|---|---|---|---|---|---|---|---|
| Value | 10 | 87.42 | 1570 | 8670 | 0.11 | 99.17 | 117.13 |
Time to First Token
| Metrics | Mean TTFT (ms) | Median TTFT (ms) | P99 TTFT (ms) |
|---|---|---|---|
| Value | 37.82 | 35.12 | 52.09 |
Time per Output Token (excl. 1st token)
| Metrics | Mean TPOT (ms) | Median TPOT (ms) | P99 TPOT (ms) |
|---|---|---|---|
| Value | 10.03 | 10.03 | 10.06 |
Inter-token Latency
| Metrics | Mean ITL (ms) | Median ITL (ms) | P99 ITL (ms) |
|---|---|---|---|
| Value | 12.12 | 10.01 | 12.44 |
Beta Was this translation helpful? Give feedback.
-
Environment
Server Config
Bench Config
Result
|
Beta Was this translation helpful? Give feedback.
-
|
GPU: NVIDIA H100 Model: Qwen3-32B
|
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
We appreciate your valuable evaluation results.
Please provide the following information:
English description is mandatory and Chinese is optional.
我们期待您可以分享宝贵的实测数据,请提供以下信息:
必须提供英文的描述,也欢迎附上中文说明。
Beta Was this translation helpful? Give feedback.
All reactions