-
Notifications
You must be signed in to change notification settings - Fork 36
Expand file tree
/
Copy pathquiz.json
More file actions
90 lines (90 loc) · 3.32 KB
/
Copy pathquiz.json
File metadata and controls
90 lines (90 loc) · 3.32 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
{
"lesson": "31-agent-workbench-why-models-fail",
"title": "Agent 工作台工程:有能力的模型为何仍会失败",
"questions": [
{
"stage": "pre",
"question": "本课认定 agent 在真实任务上失败的根本原因是什么?",
"options": [
"模型参数不足",
"工作台失败:模型周围缺失了一些面(surface),而非 LLM 的局限",
"网络慢",
"训练数据过时"
],
"correct": 1,
"explanation": "模型对 Python 没搞错;它搞错的是这份工作。模型周围的面缺失了。"
},
{
"stage": "pre",
"question": "本课命名的七个工作台面是什么?",
"options": [
"指令、状态、范围、反馈、验证、审查、交接",
"Train、eval、deploy、monitor、retrain、scale、retire",
"Plan、act、reflect、refine、debate、vote、ship",
"Read、write、exec、fork、exit、wait、kill"
],
"correct": 0,
"explanation": "七个面是指令、状态、范围、反馈、验证、审查、交接。"
},
{
"stage": "check",
"question": "下列哪一个不是本课把这些面映射到的八个分布式系统原语之一?",
"options": [
"函数(Function)",
"Worker",
"触发器(Trigger)",
"反向传播(Backpropagation)"
],
"correct": 3,
"explanation": "八个原语是函数、worker、触发器、运行时、HTTP/RPC、队列、会话持久化、授权策略。"
},
{
"stage": "check",
"question": "Vercel 报告的 harness 改动把成功率从多少提到了多少?",
"options": [
"20% 到 60%",
"80% 到 100%,靠删掉 agent 80% 的工具",
"0% 到 100%,靠换模型",
"50% 到 70%,靠加 RAG"
],
"correct": 1,
"explanation": "删掉 80% 的工具把 Vercel 的 agent 从 80% 提到了 100% 成功率。"
},
{
"stage": "check",
"question": "Terminal Bench 2.0 关于模型 vs harness 展示了什么?",
"options": [
"只有模型本身决定排名",
"仅改变 harness,同一个模型就从 30 名开外升到了第五名",
"harness 改动无关紧要",
"只有 GPU 重要"
],
"correct": 1,
"explanation": "LangChain 的《Anatomy of an Agent Harness》:同一模型,改 harness,排名跳升 25+ 名。"
},
{
"stage": "post",
"question": "本课建议在你听到新的 harness 词汇时怎么做?",
"options": [
"原样照搬这套词汇",
"在采纳之前,把它翻译回原语(函数、worker、触发器、运行时、HTTP/RPC、队列、持久化、策略)",
"拒绝它",
"等 OpenAI 把它标准化"
],
"correct": 1,
"explanation": "从原语出发推理,而非从厂商分类法;词汇会变,但工程不变。"
},
{
"stage": "post",
"question": "聊天历史相对于工作台处于什么位置?",
"options": [
"聊天是记录系统(system of record)",
"聊天是易失的;repo 才是记录系统",
"两者等价",
"两者都不重要"
],
"correct": 1,
"explanation": "循环闭合在状态文件上,而非聊天历史上。"
}
]
}