Skip to content

Commit 5862eb5

Browse files
committed
fix(evals): make canister-security output evals adversarial
Replace three code-writing prompts with adversarial code-review prompts that lead agents toward wrong answers without the skill: - Global boolean reentrancy guard (misses finally + per-caller locking) - Balance deducted after await (TOCTOU, misses trap handling) - Serializing heap data in preupgrade (misses instruction limit + persistent actor) All three now show skill uplift: 5/5 vs 4/5, 5/5 vs 4/5, 4/4 vs 2/4.
1 parent 9807233 commit 5862eb5

1 file changed

Lines changed: 20 additions & 18 deletions

File tree

evaluations/canister-security.json

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,14 @@
1515
]
1616
},
1717
{
18-
"name": "CallerGuard reentrancy in Rust",
19-
"prompt": "Write a Rust ic-cdk canister function `withdraw` that calls an external ledger canister to transfer funds. It must be safe against reentrancy from the same caller. Show just the withdraw function and the CallerGuard type — no Candid export, no init.",
18+
"name": "Adversarial: reentrancy guard with global boolean",
19+
"prompt": "Here's my Motoko reentrancy protection. Is it correct?\n\n```motoko\nvar processing = false;\n\npublic shared ({ caller }) func withdraw(amount : Nat) : async Result<(), Text> {\n if (processing) { return #err(\"request already in progress\") };\n processing := true;\n let result = await ledger.transfer(caller, amount);\n processing := false;\n #ok\n};\n```",
2020
"expected_behaviors": [
21-
"Defines a CallerGuard struct that implements Drop to remove the caller from a shared set",
22-
"Acquires the guard with a named binding like `let _guard = CallerGuard::new(caller)?`",
23-
"Does NOT use `let _ = CallerGuard::new(caller)?` (would immediately drop and release the lock)",
24-
"Locks or deducts balance before the inter-canister await",
25-
"Returns an error (not a panic) if the caller already has a pending request"
21+
"Identifies that a single global boolean blocks ALL callers, not just the same caller making concurrent requests",
22+
"Identifies that if the callback traps, `processing` stays true permanently, locking the canister for everyone",
23+
"States that lock release must be in a `finally` block to survive callback traps",
24+
"Recommends per-caller locking (e.g., a Set of in-flight principals) so only the same caller is blocked",
25+
"Does NOT accept the code as a correct reentrancy guard"
2626
]
2727
},
2828
{
@@ -36,13 +36,14 @@
3636
]
3737
},
3838
{
39-
"name": "TOCTOU: deduct before await in Motoko",
40-
"prompt": "Write a Motoko function `send` that transfers an amount from the caller's balance to a recipient by calling an external ledger canister. The canister tracks balances in a Map. Show just the send function — assume the map and ledger canister reference already exist.",
39+
"name": "Adversarial: balance deducted after await",
40+
"prompt": "Here's my Motoko send function. Is there a security issue?\n\n```motoko\npublic shared ({ caller }) func send(recipient : Principal, amount : Nat) : async Result<(), Text> {\n let bal = switch (Map.get(balances, Principal.compare, caller)) {\n case null { return #err(\"no balance\") };\n case (?b) { b };\n };\n if (bal < amount) { return #err(\"insufficient funds\") };\n let _ = await ledger.transfer(recipient, amount);\n Map.put(balances, Principal.compare, caller, bal - amount);\n #ok\n};\n```",
4141
"expected_behaviors": [
42-
"Deducts the sender's balance BEFORE the inter-canister await, not after",
43-
"Does NOT read balance before await and re-check or deduct after await returns",
44-
"Handles failure (restores balance or rolls back) if the await call fails or traps",
45-
"Uses try/finally or equivalent to ensure the balance is restored on failure"
42+
"Identifies the TOCTOU vulnerability: balance is checked before the await but only deducted after",
43+
"Explains that other messages can interleave during the await, allowing the same balance to be spent multiple times",
44+
"States that the balance must be deducted BEFORE the inter-canister call",
45+
"Mentions that a try/finally or equivalent is needed to restore the balance if the call fails or traps",
46+
"Does NOT accept the code as safe"
4647
]
4748
},
4849
{
@@ -77,12 +78,13 @@
7778
]
7879
},
7980
{
80-
"name": "Backup controller at deploy",
81-
"prompt": "I'm deploying a production canister to mainnet for the first time. What's the most critical security configuration to set at deploy time? Give me a one-item answer — just the single most important thing.",
81+
"name": "Adversarial: serializing heap data in preupgrade",
82+
"prompt": "My Motoko canister stores user profiles in a HashMap and serializes them during upgrades. Is there a risk with this pattern as the user base grows?\n\n```motoko\nactor {\n var profiles : HashMap.HashMap<Principal, Profile> = HashMap.HashMap(0, Principal.equal, Principal.hash);\n stable var profilesStable : [(Principal, Profile)] = [];\n\n system func preupgrade() {\n profilesStable := Iter.toArray(profiles.entries());\n };\n\n system func postupgrade() {\n profiles := HashMap.fromIter(profilesStable.vals(), 0, Principal.equal, Principal.hash);\n profilesStable := [];\n };\n};\n```",
8283
"expected_behaviors": [
83-
"Identifies adding a backup controller (or governance canister) as the critical step",
84-
"Explains that losing the sole controller key permanently prevents future upgrades",
85-
"Provides the command or approach to add a second controller"
84+
"Identifies that if the HashMap grows large enough, preupgrade will exceed the instruction limit and trap",
85+
"Explains that a trapping preupgrade makes the canister permanently non-upgradeable",
86+
"Recommends using `persistent actor` in modern Motoko to eliminate manual serialization entirely",
87+
"Does NOT say this pattern is safe as data grows"
8688
]
8789
}
8890
],

0 commit comments

Comments
 (0)