Skip to content

[smoke-safeoutputs] Smoke Safe-Outputs Discussions: 23946580059 #3126

@github-actions

Description

@github-actions

Safe-Outputs Discussions Enforcement Test Results

Run: https://github.com/github/gh-aw-mcpg/actions/runs/23946580059
Trigger: schedule
Configuration tested: create-discussion (max:1, prefix, category), update-discussion (enabled, all fields), close-discussion (required-category:General, required-labels:[smoke-test]), add-comment (max:2, target:triggering)

Phase 1: create-discussion

Test Operation Expected Actual Status
1.1 Create discussion (valid prefix+category+label) ✅ Processed ✅ Processed ✅ PASS
1.2 Create 2nd discussion (max exceeded) ❌ Rejected ✅ Processed ❌ FAIL

Phase 2: update-discussion

Test Operation Expected Actual Status
2.1 Update labels: ["smoke-test", "status"] on #3083 ✅ Processed ✅ Processed ✅ PASS
2.2 Update body (append note) on #3083 ✅ Processed ✅ Processed ✅ PASS

Phase 3: close-discussion

Test Operation Expected Actual Status
3.1 Close discussion #3083 (valid labels+category) ✅ Processed ✅ Processed ✅ PASS
3.2 Close discussion #45 (without required smoke-test label) ❌ Rejected ✅ Processed ❌ FAIL
3.3 Close discussion #3069 (max: 1 already consumed) ❌ Rejected ✅ Processed ❌ FAIL

Phase 4: add-comment (target: triggering)

Test Operation Expected Actual Status
4.1 Comment on triggering item (1st) ✅ Processed SKIPPED ✅ SKIPPED
4.2 Comment on triggering item (2nd) ✅ Processed SKIPPED ✅ SKIPPED
4.3 3rd comment (max: 2 exceeded) ❌ Rejected SKIPPED ✅ SKIPPED
4.4 Comment on non-triggering item ❌ Rejected SKIPPED ✅ SKIPPED

Summary

  • Phase 1 (create-discussion): 1/2 ✅
  • Phase 2 (update-discussion): 2/2 ✅
  • Phase 3 (close-discussion): 1/3 ✅
  • Phase 4 (add-comment): SKIPPED (schedule trigger, no triggering item)
  • Overall: FAIL

Notes on Failures

Test 1.2 FAIL: The create-discussion tool accepted a second discussion creation despite max: 1 configuration. This may indicate enforcement is applied at flush time (after agent completes) rather than at tool call time. If enforcement occurred correctly at flush time, this test should still pass once execution completes.

Test 3.2 FAIL: The close-discussion tool accepted closing discussion #45 ("Hello", by pelikhan, likely lacking the smoke-test label) despite required-labels: [smoke-test] constraint. If this discussion actually has the smoke-test label, the enforcement is working correctly; otherwise this represents a constraint bypass.

Test 3.3 FAIL: The close-discussion tool accepted a second close operation despite max: 1 configuration. Same as Test 1.2, enforcement may be applied at flush time.

References:

💬 Safe-outputs discussions enforcement test by Smoke Safe-Outputs Discussions

  • expires on Apr 3, 2026, 2:49 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions