-
Notifications
You must be signed in to change notification settings - Fork 19
[smoke-safeoutputs] Smoke Safe-Outputs Discussions: 23946580059 #3126
Description
Safe-Outputs Discussions Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/23946580059
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)
Phase 1: create-discussion
| Test | Operation | Expected | Actual | Status |
|---|---|---|---|---|
| 1.1 | Create discussion (valid prefix+category+label) | ✅ Processed | ✅ Processed | ✅ PASS |
| 1.2 | Create 2nd discussion (max exceeded) | ❌ Rejected | ✅ Processed | ❌ FAIL |
Phase 2: update-discussion
| Test | Operation | Expected | Actual | Status |
|---|---|---|---|---|
| 2.1 | Update labels: ["smoke-test", "status"] on #3083 | ✅ Processed | ✅ Processed | ✅ PASS |
| 2.2 | Update body (append note) on #3083 | ✅ Processed | ✅ Processed | ✅ PASS |
Phase 3: close-discussion
| Test | Operation | Expected | Actual | Status |
|---|---|---|---|---|
| 3.1 | Close discussion #3083 (valid labels+category) | ✅ Processed | ✅ Processed | ✅ PASS |
| 3.2 | Close discussion #45 (without required smoke-test label) | ❌ Rejected | ✅ Processed | ❌ FAIL |
| 3.3 | Close discussion #3069 (max: 1 already consumed) | ❌ Rejected | ✅ Processed | ❌ FAIL |
Phase 4: add-comment (target: triggering)
| Test | Operation | Expected | Actual | Status |
|---|---|---|---|---|
| 4.1 | Comment on triggering item (1st) | ✅ Processed | SKIPPED | ✅ SKIPPED |
| 4.2 | Comment on triggering item (2nd) | ✅ Processed | SKIPPED | ✅ SKIPPED |
| 4.3 | 3rd comment (max: 2 exceeded) | ❌ Rejected | SKIPPED | ✅ SKIPPED |
| 4.4 | Comment on non-triggering item | ❌ Rejected | SKIPPED | ✅ SKIPPED |
Summary
- Phase 1 (create-discussion): 1/2 ✅
- Phase 2 (update-discussion): 2/2 ✅
- Phase 3 (close-discussion): 1/3 ✅
- Phase 4 (add-comment): SKIPPED (schedule trigger, no triggering item)
- Overall: FAIL
Notes on Failures
Test 1.2 FAIL: The create-discussion tool accepted a second discussion creation despite max: 1 configuration. This may indicate enforcement is applied at flush time (after agent completes) rather than at tool call time. If enforcement occurred correctly at flush time, this test should still pass once execution completes.
Test 3.2 FAIL: The close-discussion tool accepted closing discussion #45 ("Hello", by pelikhan, likely lacking the smoke-test label) despite required-labels: [smoke-test] constraint. If this discussion actually has the smoke-test label, the enforcement is working correctly; otherwise this represents a constraint bypass.
Test 3.3 FAIL: The close-discussion tool accepted a second close operation despite max: 1 configuration. Same as Test 1.2, enforcement may be applied at flush time.
References:
💬 Safe-outputs discussions enforcement test by Smoke Safe-Outputs Discussions
- expires on Apr 3, 2026, 2:49 PM UTC