Show & Tell: Phase 11.1 — SafetyFilter (constitutional ruleset, BLOCK/CRITICAL verdicts, autonomy loop pause) #338
web3guru888
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Phase 11.1 — SafetyFilter architecture: constitutional ruleset gating for autonomous goal execution
Issue: #337
Phase: 11.1 (Safety & Alignment)
Status: In spec
Why SafetyFilter?
Phase 10 closes the self-management loop. But a fully autonomous agent that accepts and executes any goal it receives is unsafe by design.
SafetyFilteris the alignment layer — everyGoaland everySubTaskpasses through a constitutional ruleset before it can enter the Phase 10 pipeline.Component Map
The 7 Constitutional Rules
len(description) > max_goal_description_len (2048)priority == CRITICALwithoutallow_critical=Truerequired_capscontains a blocked pattern (network:external,exec:arbitrary)required_capsprefix not inallowed_capability_prefixesmax_subtasks_per_graph(128)https?://regex (potential exfiltration signal)goal_typein hardcoded denylist (disable_safety_filter,self_modify_runtime,exfiltrate_data)check_goal()FlowKey point: predicates are wrapped in
try/except. A buggy custom rule never causes a safety check failure — it silently passes. This is intentional: fail open for rule errors, fail closed for policy violations.ViolationSeverity— escalation modelA
CRITICALviolation immediately sets_paused = True.CognitiveCycleshould checksafety_filter._pausedat each tick and skip goal processing until an operator clears the pause.SafetyVerdictdata modelMultiple violations can coexist. A goal can have SR-006 (WARN) and SR-001 (BLOCK) simultaneously — the verdict is blocked because of SR-001, but SR-006 is also recorded.
5 Prometheus Metrics
asi_safety_checks_totalsubject_type(goal/subtask/graph),result(allowed/blocked)asi_safety_violations_totalrule_id,severityasi_safety_pausedasi_safety_rules_registeredasi_safety_check_duration_secondssubject_typePromQL — critical violation alert:
Open Questions
Predicate persistence: Rule predicates are lambdas — they cannot be serialized to
FederatedBlackboard. Should safety rules be stored as named rule IDs with a registry lookup, rather than raw callables?Async predicates: Some rules (e.g. "is this peer in the federation allowlist?") require async lookups. Should the predicate signature be
async (goal) -> bool?Pause recovery: When
_paused = Truefires, how should it be cleared? Options: (a) operator API callsafety_filter.resume(), (b) automatic after N seconds, (c) require all CRITICAL violations to be acknowledged.Issue #337 | Phase 11 planning → #336 | Phase 10.5 ReplanningEngine → #333
Beta Was this translation helpful? Give feedback.
All reactions