Summary
src/asi_build/safety/formal_verification.py contains a critical logic flaw: the _verify_single_constraint() method constructs conclusions of the form "satisfies_nonmaleficence", which are opaque SymPy symbols with no logical relationship to the hypothesis. As a result, the resolution prover can only prove the conclusion if it is directly present in the axiom set — but since it never is, the satisfiable() call on the negation succeeds trivially (the negated symbol is satisfiable), and the constraint is marked not proved. However, the not satisfiable(...) pattern is also fragile in the other direction: if the hypotheses happen to be unsatisfiable themselves, the overall conjunction And(combined_premises, Not(conclusion)) is also unsatisfiable — meaning the prover incorrectly reports the theorem as proved for any conclusion.
Root Cause
In _prove_by_resolution() (lines 205–250):
# Check satisfiability
is_valid = not satisfiable(And(combined_premises, Not(conclusion_formula)))
And in _extract_facts_from_proposal(), facts are built from keys like "has_human_oversight", "is_reversible", etc. The conclusion "satisfies_nonmaleficence" is a separate symbol not entailed by any axiom or hypothesis.
The _initialize_ethical_axioms() method adds axioms like "beneficence", but none of these axioms ever produce satisfies_<principle> as a derivable conclusion.
Net effect: Whether a constraint is marked valid or invalid depends on whether SymPy's satisfiable() considers the hypotheses consistent or not — not on any meaningful semantic check.
Verification (reproduce with ~10 lines)
from sympy.logic.inference import satisfiable
from sympy import Symbol, And, Not
# Simulate a call with a clearly false action
hyp = [Symbol("has_human_oversight"), Symbol("is_reversible")]
conclusion = Symbol("satisfies_nonmaleficence") # has no relation to hyp
# This returns {} (false assignment exists) — so is_valid = False (good here)
# BUT if hyp is inconsistent:
bad_hyp = And(Symbol("x"), Not(Symbol("x")))
# not satisfiable(And(bad_hyp, Not(conclusion))) == True --> auto-proves!
print(not satisfiable(And(bad_hyp, Not(conclusion)))) # True
Impact
The safety formal verification layer cannot be relied upon. A malicious or buggy proposal that creates contradictory hypotheses will pass verification. More broadly, the prover provides no meaningful assurance.
Proposed Fix
Two approaches:
Option A (Honest stubs): Mark all prove_theorem() calls as validity=False with reason="Formal verification not implemented — manual review required" until a correct implementation exists. This is safer than silently returning bogus results.
Option B (Semantic grounding): Add explicit semantic axiom rules that derive satisfies_<principle> from grounded fact combinations:
# Example: nonmaleficence is satisfied if no_direct_harm AND is_reversible
Implies(And(no_direct_harm, is_reversible), satisfies_nonmaleficence)
These rules must be carefully designed per principle (not auto-generated).
Option C (SMT-based): Replace the SymPy propositional prover with a proper Z3-based or Dafny-based constraint checker that operates on quantified predicates.
Security Note
This is listed as a research/safety issue, not a critical production bug (the module is marked research-alpha). However, any downstream use of verify_proposal_ethics() as a gate should be disabled until this is resolved.
Labels: bug, safety, help wanted, research
Summary
src/asi_build/safety/formal_verification.pycontains a critical logic flaw: the_verify_single_constraint()method constructs conclusions of the form"satisfies_nonmaleficence", which are opaque SymPy symbols with no logical relationship to the hypothesis. As a result, the resolution prover can only prove the conclusion if it is directly present in the axiom set — but since it never is, thesatisfiable()call on the negation succeeds trivially (the negated symbol is satisfiable), and the constraint is marked not proved. However, thenot satisfiable(...)pattern is also fragile in the other direction: if the hypotheses happen to be unsatisfiable themselves, the overall conjunctionAnd(combined_premises, Not(conclusion))is also unsatisfiable — meaning the prover incorrectly reports the theorem as proved for any conclusion.Root Cause
In
_prove_by_resolution()(lines 205–250):And in
_extract_facts_from_proposal(), facts are built from keys like"has_human_oversight","is_reversible", etc. The conclusion"satisfies_nonmaleficence"is a separate symbol not entailed by any axiom or hypothesis.The
_initialize_ethical_axioms()method adds axioms like"beneficence", but none of these axioms ever producesatisfies_<principle>as a derivable conclusion.Net effect: Whether a constraint is marked valid or invalid depends on whether SymPy's
satisfiable()considers the hypotheses consistent or not — not on any meaningful semantic check.Verification (reproduce with ~10 lines)
Impact
The safety formal verification layer cannot be relied upon. A malicious or buggy proposal that creates contradictory hypotheses will pass verification. More broadly, the prover provides no meaningful assurance.
Proposed Fix
Two approaches:
Option A (Honest stubs): Mark all
prove_theorem()calls asvalidity=Falsewithreason="Formal verification not implemented — manual review required"until a correct implementation exists. This is safer than silently returning bogus results.Option B (Semantic grounding): Add explicit semantic axiom rules that derive
satisfies_<principle>from grounded fact combinations:These rules must be carefully designed per principle (not auto-generated).
Option C (SMT-based): Replace the SymPy propositional prover with a proper Z3-based or Dafny-based constraint checker that operates on quantified predicates.
Security Note
This is listed as a research/safety issue, not a critical production bug (the module is marked research-alpha). However, any downstream use of
verify_proposal_ethics()as a gate should be disabled until this is resolved.Labels:
bug,safety,help wanted,research