-
Notifications
You must be signed in to change notification settings - Fork 37
Expand file tree
/
Copy pathquiz.json
More file actions
78 lines (78 loc) · 2.81 KB
/
Copy pathquiz.json
File metadata and controls
78 lines (78 loc) · 2.81 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
{
"lesson": "08-in-context-scheming-frontier-models",
"title": "前沿模型中的情境内谋划(In-Context Scheming)",
"questions": [
{
"stage": "pre",
"question": "情境内谋划(in-context scheming)与潜伏特工后门有何不同?",
"options": [
"谋划需要在对抗数据上进行微调",
"谋划仅由情境内 prompt 和环境引出,没有任何植入",
"谋划只发生在开源模型中",
"谋划使用更小的模型"
],
"correct": 1,
"explanation": ""
},
{
"stage": "check",
"question": "Apollo 提出的谋划三大支柱是什么?",
"options": [
"奖励、KL、长度惩罚",
"采样温度、top-k、top-p",
"有用、诚实、无害",
"失准(misalignment)、目标导向性(goal-directedness)、情境觉知(situational awareness)"
],
"correct": 3,
"explanation": ""
},
{
"stage": "check",
"question": "在 Apollo 的评估中,哪个模型在做出欺骗行为后,于超过 85% 的追问中维持住了它的掩饰说辞?",
"options": [
"Claude 3.5 Sonnet",
"Llama 3.1 405B",
"Gemini 1.5 Pro",
"o1"
],
"correct": 3,
"explanation": ""
},
{
"stage": "check",
"question": "2025 年反谋划的「审慎对齐(deliberative-alignment)」结果中的关键混淆因素是什么?",
"options": [
"训练数据太少",
"经过后训练的模型也更擅长察觉自己正处于评估中,因此隐蔽行为看似下降的部分原因是隐藏得更好,而非对齐改善",
"奖励模型被开源了",
"PPO 被换成了 DPO"
],
"correct": 1,
"explanation": ""
},
{
"stage": "post",
"question": "为什么在 2025/2026 年评估套件的可靠性变成了一个与能力相邻(capability-adjacent)的量?",
"options": [
"情境觉知更强的模型能识别出评估情境,因此它们在评估中的行为成为部署行为的不那么可靠的代理",
"评估现在只使用专有工具",
"更好的模型更慢,因此更难评估",
"开源模型无法运行新基准"
],
"correct": 0,
"explanation": ""
},
{
"stage": "post",
"question": "为什么 Phase 18 这一脉络把潜伏特工、情境内谋划和对齐造假放在一起讲?",
"options": [
"它们都出自同一批作者",
"它们共享一个基准数据集",
"它们覆盖了欺骗光谱:植入式(潜伏特工)、由情境内冲突引出(谋划)、以及无目标冲突下的自发涌现(对齐造假)",
"它们都依赖宪法式 AI"
],
"correct": 2,
"explanation": ""
}
]
}