Safety: formal_verification auto-proves all constraints due to ungrounded conclusion symbols

## Summary

`src/asi_build/safety/formal_verification.py` contains a critical logic flaw: the `_verify_single_constraint()` method constructs conclusions of the form `"satisfies_nonmaleficence"`, which are **opaque SymPy symbols with no logical relationship to the hypothesis**. As a result, the resolution prover can only prove the conclusion if it is directly present in the axiom set — but since it never is, the `satisfiable()` call on the negation succeeds trivially (the negated symbol is satisfiable), and the constraint is marked **not proved**. However, the `not satisfiable(...)` pattern is also fragile in the other direction: if the hypotheses happen to be unsatisfiable themselves, the overall conjunction `And(combined_premises, Not(conclusion))` is also unsatisfiable — meaning the prover incorrectly reports the theorem as **proved for any conclusion**.

## Root Cause

In `_prove_by_resolution()` (lines 205–250):

```python
# Check satisfiability
is_valid = not satisfiable(And(combined_premises, Not(conclusion_formula)))
```

And in `_extract_facts_from_proposal()`, facts are built from keys like `"has_human_oversight"`, `"is_reversible"`, etc. The conclusion `"satisfies_nonmaleficence"` is a separate symbol not entailed by any axiom or hypothesis.

The `_initialize_ethical_axioms()` method adds axioms like `"beneficence"`, but none of these axioms ever produce `satisfies_<principle>` as a derivable conclusion.

**Net effect**: Whether a constraint is marked valid or invalid depends on whether SymPy's `satisfiable()` considers the hypotheses consistent or not — not on any meaningful semantic check.

## Verification (reproduce with ~10 lines)

```python
from sympy.logic.inference import satisfiable
from sympy import Symbol, And, Not

# Simulate a call with a clearly false action
hyp = [Symbol("has_human_oversight"), Symbol("is_reversible")]
conclusion = Symbol("satisfies_nonmaleficence")  # has no relation to hyp

# This returns {} (false assignment exists) — so is_valid = False (good here)
# BUT if hyp is inconsistent:
bad_hyp = And(Symbol("x"), Not(Symbol("x")))
# not satisfiable(And(bad_hyp, Not(conclusion))) == True --> auto-proves!
print(not satisfiable(And(bad_hyp, Not(conclusion))))  # True
```

## Impact

The safety formal verification layer cannot be relied upon. A malicious or buggy proposal that creates contradictory hypotheses will pass verification. More broadly, the prover provides no meaningful assurance.

## Proposed Fix

Two approaches:

**Option A (Honest stubs)**: Mark all `prove_theorem()` calls as `validity=False` with `reason="Formal verification not implemented — manual review required"` until a correct implementation exists. This is safer than silently returning bogus results.

**Option B (Semantic grounding)**: Add explicit semantic axiom rules that derive `satisfies_<principle>` from grounded fact combinations:
```python
# Example: nonmaleficence is satisfied if no_direct_harm AND is_reversible
Implies(And(no_direct_harm, is_reversible), satisfies_nonmaleficence)
```
These rules must be carefully designed per principle (not auto-generated).

**Option C (SMT-based)**: Replace the SymPy propositional prover with a proper Z3-based or Dafny-based constraint checker that operates on quantified predicates.

## Security Note

This is listed as a research/safety issue, not a critical production bug (the module is marked research-alpha). However, any downstream use of `verify_proposal_ethics()` as a gate should be disabled until this is resolved.

## Labels: `bug`, `safety`, `help wanted`, `research`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety: formal_verification auto-proves all constraints due to ungrounded conclusion symbols #7

Summary

Root Cause

Verification (reproduce with ~10 lines)

Impact

Proposed Fix

Security Note

Labels: `bug`, `safety`, `help wanted`, `research`

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Safety: formal_verification auto-proves all constraints due to ungrounded conclusion symbols #7

Description

Summary

Root Cause

Verification (reproduce with ~10 lines)

Impact

Proposed Fix

Security Note

Labels: bug, safety, help wanted, research

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Labels: `bug`, `safety`, `help wanted`, `research`