-
Notifications
You must be signed in to change notification settings - Fork 37
Expand file tree
/
Copy pathquiz.json
More file actions
78 lines (78 loc) · 2.44 KB
/
Copy pathquiz.json
File metadata and controls
78 lines (78 loc) · 2.44 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
{
"lesson": "12-edge-inference",
"title": "边缘推理 —— Apple Neural Engine、Qualcomm Hexagon、WebGPU/WebLLM、Jetson",
"questions": [
{
"stage": "pre",
"question": "按本课所述,使移动端 LLM 推理慢于数据中心的核心约束是什么?",
"options": [
"Wi-Fi 延迟",
"计算吞吐量",
"内存带宽(移动 DRAM 为 50-90 GB/s,对比 HBM3 的 2-3 TB/s)",
"存储容量"
],
"correct": 2,
"explanation": ""
},
{
"stage": "check",
"question": "为什么 Apple 的 Neural Engine 能避免 CPU-NPU 的拷贝开销?",
"options": [
"它使用 PCIe 5.0",
"Core ML 禁用了 KV 缓存",
"它在拷贝前把权重转码为 FP4",
"Apple Silicon 采用统一内存(unified memory)—— CPU 和 ANE 共享同一个内存池"
],
"correct": 3,
"explanation": ""
},
{
"stage": "check",
"question": "本课为浏览器中的 WebGPU + WebLLM 推荐哪种量化格式?",
"options": [
"NVFP4",
"FP8",
"通过 mlc_llm convert_weight 编译的 Q4 MLC(q4f16_1)",
"GGUF Q4_K_M"
],
"correct": 2,
"explanation": ""
},
{
"stage": "check",
"question": "本课报告 2026 年 WebGPU 在移动端的覆盖率大约是多少?",
"options": [
"在所有浏览器上达到 100%",
"仅 iOS Safari",
"不到 10%",
"约 70-75%,其中 Firefox Android 仍在追赶"
],
"correct": 3,
"explanation": ""
},
{
"stage": "post",
"question": "为什么在一台典型手机上保持 128K 上下文不切实际?",
"options": [
"Tokenizer 在超过 8K 时会失败",
"模型权重加上 32K token 的 KV 缓存再加上操作系统开销,很容易超出 8 GB 内存预算",
"WebGPU 将上下文上限设为 4K",
"iOS 禁止长上下文"
],
"correct": 1,
"explanation": ""
},
{
"stage": "post",
"question": "为什么语音被强调为边缘推理的杀手级应用?",
"options": [
"语音 agent 对延迟敏感(首 token < 500 ms),而本地推理能完全消除网络延迟",
"语音模型不需要 KV 缓存",
"语音模型总能塞进 50 MB",
"语音只在 WebAssembly 中运行"
],
"correct": 0,
"explanation": ""
}
]
}