Skip to content

SMS/WhatsApp relay, emergency approval, contact preferences & testing improvements#32

Merged
Josephrp merged 10 commits into
devfrom
twilio
Mar 7, 2026
Merged

SMS/WhatsApp relay, emergency approval, contact preferences & testing improvements#32
Josephrp merged 10 commits into
devfrom
twilio

Conversation

@Josephrp
Copy link
Copy Markdown
Owner

@Josephrp Josephrp commented Mar 7, 2026

solves #9

Summary

This PR adds SMS and WhatsApp support (Twilio), emergency message approval (operator-in-the-loop), contact preferences and notify-on-relay (per-channel opt-out), consolidates docs into a single Response & compliance page, and improves CI and test reliability (Postgres + migrations, test fixes).

Latest commits:

  • e542fd6 — adds sms and whatsapp + emergency contact
  • 501ea54 — improves testing

1. SMS & WhatsApp (Twilio)

  • Config: twilio.account_sid, twilio.auth_token, twilio.from_number, twilio.whatsapp_from (env RADIOSHAQ_TWILIO__* or YAML). Documented in Configuration → Twilio.
  • Outbound: Single outbound dispatcher sends via SMS/WhatsApp when the MessageBus consumer is enabled. SMS and WhatsApp agents use the same Twilio client; factory passes client= to WhatsAppAgent (fixed parameter name).
  • Setup: radioshaq setup --reconfigure captures Twilio credentials (including auth token) before clearing secrets and writes them to .env; TTS reconfigure path now returns and persists the ElevenLabs API key.

2. Emergency approval (operator receive & transmit)

  • Purpose: Operator receives emergency requests and transmits by approving; messages are queued until approval.
  • Config: RADIOSHAQ_EMERGENCY_CONTACT__ENABLED, RADIOSHAQ_EMERGENCY_CONTACT__REGIONS_ALLOWED. See Response & compliance.
  • API:
    • GET /emergency/pending-count, GET /emergency/events, GET /emergency/events/stream (SSE)
    • POST /emergency/events/{id}/approve, POST /emergency/events/{id}/reject
    • Relay with emergency: true and SMS/WhatsApp target queues for approval; API route returns queued_for_approval + event_id (no KeyError).
  • Web UI: Emergency page (list pending, approve/reject, notes), polling + audio alert + browser notifications when pending count goes 0→N; i18n (EN/FR/ES).
  • Backend: Coordination events stored in DB; extra_data updates use a new dict so SQLAlchemy persists audit fields (approved_at, approved_by, sent_at, rejected_at). Relay service and orchestrator relay tool receive full Config for emergency_contact.

3. Relay (radio + SMS/WhatsApp)

  • Relay route: When target_channel is sms or whatsapp, target_band is not looked up in band plans (avoids KeyError); target_freq default 0 for non-radio.
  • Relay delivery worker:
    • Only marks transcript delivered when delivery actually succeeded: radio path after inject (+ optional TX); SMS/WhatsApp path only when publish_outbound returns true (avoids marking delivered when queue is full; failed items stay pending for retry).
    • Notify-on-relay uses per-channel opt-out (see below).

4. Contact preferences & per-channel opt-out

  • API: GET/PATCH /callsigns/registered/{callsign}/contact-preferences for notify-on-relay (SMS/WhatsApp phones, consent). New migration adds notify_opt_out_at_sms and notify_opt_out_at_whatsapp.
  • Opt-out: record_opt_out(callsign, channel) and record_opt_out_by_phone(phone, channel) set only the channel-specific timestamp and clear that channel’s phone. No single global notify_opt_out_at for the worker.
  • Worker: Notify-on-relay checks notify_opt_out_at_sms / notify_opt_out_at_whatsapp per channel; opting out of SMS does not block WhatsApp and vice versa.
  • Reversing opt-out: set_contact_preferences clears notify_opt_out_at_sms when a non-empty SMS phone is set, and notify_opt_out_at_whatsapp when a non-empty WhatsApp phone is set.

5. Documentation

  • Single page: “Response & compliance” (response-compliance-and-monitoring.md) covers: operator response (emergency, relay, contact preferences), compliance (radio restricted bands, band plans, country mapping; messaging consent/opt-out), and monitoring (Prometheus, health, WebSocket).
  • Removed: Standalone monitoring.md; nav entry is now “Response & compliance”. compliance-regulatory.md is a short stub pointing to that page.
  • Configuration: Twilio, TTS (ElevenLabs/Kokoro), ASR (voxtral/whisper/scribe), compliance/region table and band_plan_region documented in Configuration. API reference updated (GIS, emergency, metrics link).

6. CI & test harness

  • Workflows: test-ci, publish-nightly, publish-pypi use a Postgres service (port 5434), Wait for Postgres, and Run database migrations before tests so the schema (including notify_opt_out_at_sms / notify_opt_out_at_whatsapp) is current.
  • Test client: Any test using the client fixture triggers a session-scoped migration run (_run_db_migrations) so DB-using tests (e.g. callsigns) see the latest schema; if migrations fail (e.g. no Postgres), the session is skipped.
  • Test fixes:
    • Relay delivery notify test: assertion relaxed so multiple get_contact_preferences calls (worker loop) are allowed; still asserts at least one call with the destination callsign.
    • Setup reconfigure mock: _run_reconfigure_prompts now returns 11 values (including elevenlabs_key_reconfigure); mock in test_run_setup_reconfigure_mocked_merges_config updated accordingly.

7. Other fixes and tweaks

  • Factory: WhatsAppAgent is constructed with client=sms_client (not twilio_client) to match WhatsAppAgent.__init__(client=..., from_number=...).
  • Setup: Reconfigure TTS branch captures and returns the ElevenLabs key; caller uses it for write_env(elevenlabs_api_key=...) so the key is not lost.
  • Relay API: When the service returns queued_for_approval, the route returns ok, queued_for_approval, event_id, target_channel instead of accessing missing keys.
  • Coordination event extra_data: update_coordination_event uses dict(row.extra_data or {}) so SQLAlchemy detects the change and persists audit fields.
  • Gitignore: .ruff_cache/, .tmp_build/, .tmp_pytest/, dist-investigate/; cache/temp patterns; “RadioShaq” comment. LICENSE.md moved to repo root (GPL text only).

Migration and compatibility

  • DB: Run alembic upgrade head (or uv run alembic-upgrade) so registered_callsigns has notify_opt_out_at_sms and notify_opt_out_at_whatsapp. Existing rows keep notify_opt_out_at; legacy behaviour is “both channels opted out” when that timestamp is set and per-channel columns are null.
  • Config: New options under emergency_contact, twilio, tts; existing configs remain valid. Optional: set TEST_DATABASE_URL for tests if not using default postgresql+asyncpg://...@127.0.0.1:5434/radioshaq.

How to test

  • Unit + integration: From radioshaq/: uv run pytest tests/unit tests/integration -v. Requires Postgres on port 5434 (or set TEST_DATABASE_URL) for tests that use the DB; migrations run automatically when the client fixture is used.
  • Emergency flow: Enable emergency contact in config, create a relay with emergency: true and SMS/WhatsApp target; use Emergency page or API to approve/reject.
  • Notify-on-relay: Set contact preferences with SMS/WhatsApp phones and notify_on_relay: true; trigger a radio relay for that callsign and confirm a short SMS/WhatsApp is sent (Twilio configured). Opt out via POST /internal/opt-out and confirm only that channel stops.

Checklist

  • SMS/WhatsApp send via Twilio (config, agents, dispatcher)
  • Emergency approval API and Web UI (pending count, list, approve, reject, SSE)
  • Relay to SMS/WhatsApp and emergency queue; route handles queued_for_approval
  • Contact preferences API; per-channel opt-out and reversal in setup
  • Docs: Response & compliance page; Configuration (Twilio, TTS, ASR, compliance)
  • CI: Postgres service, migrations, web UI build (test-ci)
  • Tests: migration fixture, relay/setup/callsign assertions fixed

Greptile Summary

This PR adds SMS/WhatsApp delivery via Twilio, an operator-in-the-loop emergency approval workflow, per-channel notify-on-relay opt-out, and consolidates CI and docs. The overall architecture is solid and many previously-flagged issues (TOCTOU in approve, AudioContext leak, E.164 normalization, stop_event blocking, opt-out auth, stuck-event-on-failed-delivery) have been correctly addressed.

Critical issues remain:

  • Events permanently stuck in "approving" state (postgres_gis.py:583) — get_pending_coordination_events only queries status == "pending". If the server crashes after claim_emergency_event_pending commits but before the final state change completes, the event stays "approving" forever and is invisible to all operators with no recovery path.

  • Legacy notify_opt_out_at blocks per-channel re-opt-in (postgres_gis.py:763-764) — get_contact_preferences computes opt_out_sms = row.notify_opt_out_at_sms or row.notify_opt_out_at. set_contact_preferences clears only the per-channel column but never the legacy global column, so users who used the old opt-out mechanism can never successfully re-opt-in to SMS or WhatsApp notifications.

  • record_opt_out_by_phone raises unhandled MultipleResultsFound (postgres_gis.py:845) — Phone number columns have no UNIQUE constraint; scalar_one_or_none() raises sqlalchemy.exc.MultipleResultsFound if two callsigns share a phone, producing a 500 error and silently skipping the opt-out — a TCPA/GDPR compliance risk.

  • Reject endpoint lacks atomic claim (emergency.py:239-250) — reject_emergency_event reads, checks, then writes without using claim_emergency_event_pending. A concurrent approve+reject can both complete, leaving a delivered message recorded as "rejected".

Confidence Score: 1/5

  • Not safe to merge — the database layer has three correctness bugs affecting opt-out compliance, and the emergency reject endpoint lacks atomic protection, allowing concurrent approve+reject race conditions.
  • The core SMS/WhatsApp, dispatcher, and setup improvements are well-implemented and previous review feedback has been addressed. However, the three logic bugs in postgres_gis.py (events stuck in "approving" state with no recovery, legacy global opt-out permanently blocking per-channel re-opt-in, unhandled MultipleResultsFound exception in opt-out by phone) directly affect the correctness and compliance guarantees of the new opt-out and re-opt-in flows. Additionally, the reject endpoint lacks the atomic claim guard that was added to approve, allowing a race that can result in a sent message being recorded as rejected. These are production-critical issues for a system handling GDPR/TCPA-regulated messaging.
  • radioshaq/radioshaq/database/postgres_gis.py (three logic bugs in opt-out, event state, and crash recovery), radioshaq/radioshaq/api/routes/emergency.py (reject race condition without atomic claim).

Sequence Diagram

sequenceDiagram
    participant Operator
    participant EmergencyAPI
    participant DB
    participant MessageBus
    participant Twilio

    Operator->>EmergencyAPI: POST /emergency/events/{id}/approve
    EmergencyAPI->>DB: claim_emergency_event_pending(id)<br/>(atomic: pending → approving)
    DB-->>EmergencyAPI: claimed (or None if already processed)
    alt Claim failed
        EmergencyAPI-->>Operator: 400 Event already processed
    else Claim succeeded
        EmergencyAPI->>MessageBus: publish_outbound(channel, phone, message)
        alt Queue full / bus unavailable
            EmergencyAPI->>DB: update status → "pending" (rollback)
            EmergencyAPI-->>Operator: 200 sent=false (retry)
        else Published OK
            EmergencyAPI->>DB: update status → "approved" + audit fields
            MessageBus->>Twilio: SMS / WhatsApp send
            EmergencyAPI-->>Operator: 200 sent=true
        end
    end

    Note over EmergencyAPI,DB: Reject path does NOT use claim_emergency_event_pending<br/>(race: concurrent approve+reject can both succeed)

    Operator->>EmergencyAPI: POST /emergency/events/{id}/reject
    EmergencyAPI->>DB: get_coordination_event_by_id(id)
    DB-->>EmergencyAPI: event (status may already be "approving")
    EmergencyAPI->>DB: update status → "rejected" (no atomic guard)
Loading

Comments Outside Diff (2)

  1. radioshaq/radioshaq/specialized/relay_tools.py, line 100-104 (link)

    target_band listed as required even for SMS/WhatsApp channel

    The JSON schema for the relay_message_between_bands tool still declares target_band as required in the "required" array:

    "required": ["message", "source_band", "target_band"]

    When target_channel is "sms" or "whatsapp", target_band is unused by the service (the service skips band-plan lookup for non-radio channels). However, the LLM agent reading this schema will always be prompted to supply a target_band, even for SMS/WhatsApp relays where it has no meaningful value to provide. This can confuse the model and produce dummy values like "sms" or "unknown" for what is described as a required radio band.

    Consider making target_band optional when target_channel is not "radio":

    "required": ["message", "source_band"],
    "dependencies": {
        "target_channel": {
            "oneOf": [
                {"properties": {"target_channel": {"const": "radio"}}, "required": ["target_band"]},
                {"properties": {"target_channel": {"enum": ["sms", "whatsapp"]}}}
            ]
        }
    }

    Or document in the description that target_band can be any placeholder value (e.g. "n/a") when target_channel is sms/whatsapp.

  2. radioshaq/radioshaq/database/postgres_gis.py, line 581-584 (link)

    Events permanently stuck in "approving" state after crash

    get_pending_coordination_events (line 583) hard-codes .where(CoordinationEvent.status == "pending"). Events transition to "approving" via claim_emergency_event_pending during approval and are supposed to transition back to "pending" if publish_outbound fails, or to "approved" on success.

    If the server crashes after claim_emergency_event_pending commits status="approving" but before the rollback (when publish fails) or the success write completes, the row stays at "approving" indefinitely. Because get_pending_coordination_events only returns status == "pending" rows, such events become completely invisible to operators — there is no UI, endpoint, or cleanup job that can surface or recover them.

    At minimum, consider also querying for "approving" events that are older than a short timeout (e.g. 5 minutes) and resetting them to "pending":

    query = (
        select(CoordinationEvent)
        .where(
            (CoordinationEvent.status == "pending") |
            (
                (CoordinationEvent.status == "approving") &
                (CoordinationEvent.updated_at < datetime.now(timezone.utc) - timedelta(minutes=5))
            )
        )
        ...
    )

    Alternatively, a startup or periodic cleanup task could reset stale "approving" events back to "pending".

Last reviewed commit: b8e734c

Greptile also left 3 inline comments on this PR.

Comment thread radioshaq/radioshaq/api/routes/emergency.py Outdated
Comment thread radioshaq/radioshaq/specialized/sms_agent.py Outdated
Comment thread radioshaq/radioshaq/api/routes/relay.py Outdated
Comment thread radioshaq/radioshaq/orchestrator/outbound_dispatcher.py Outdated
Comment thread radioshaq/web-interface/src/features/emergency/emergencyAlerts.ts
@Josephrp
Copy link
Copy Markdown
Owner Author

Josephrp commented Mar 7, 2026

@greptileai : update your review based on the fixes above :

Comment thread radioshaq/radioshaq/api/routes/bus.py
Comment thread radioshaq/radioshaq/api/routes/emergency.py Outdated
Comment thread radioshaq/radioshaq/api/routes/relay.py
Comment thread radioshaq/web-interface/src/features/emergency/EmergencyPage.tsx Outdated
Comment thread radioshaq/radioshaq/api/routes/callsigns.py Outdated
@Josephrp
Copy link
Copy Markdown
Owner Author

Josephrp commented Mar 7, 2026

@greptileai , update your review based on the changes above :

Comment thread radioshaq/radioshaq/listener/relay_delivery.py Outdated
Comment thread radioshaq/radioshaq/relay/service.py
Comment thread radioshaq/radioshaq/setup.py
@Josephrp
Copy link
Copy Markdown
Owner Author

Josephrp commented Mar 7, 2026

@greptileai : update your review based on the recent changes :

Comment thread radioshaq/radioshaq/setup.py Outdated
Comment thread radioshaq/radioshaq/api/routes/emergency.py
@Josephrp
Copy link
Copy Markdown
Owner Author

Josephrp commented Mar 7, 2026

@greptileai : update your review based on the recent changes made :

Comment thread radioshaq/radioshaq/database/postgres_gis.py
Comment thread radioshaq/radioshaq/database/postgres_gis.py
Comment thread radioshaq/radioshaq/api/routes/emergency.py
@Josephrp Josephrp merged commit 7062f4d into dev Mar 7, 2026
3 checks passed
Josephrp added a commit that referenced this pull request Mar 9, 2026
* Enforce `main <- dev` governance, migrate to GPL-2.0-only, add CLI/UI license gates, and split stable/nightly PyPI release lanes (#16)

* adds memory system (#6)
* adds multiband and relay support (#7)
* Add frequency-aware radio replies (#8)

* Fixes PyPI UI bundling flow (#12)

* fixes build env bug (#13)
* fixes ci issue (#14)
* adds memory system (#6)

* adds multiband and relay support (#7)
* Add frequency-aware radio replies (#8)
* Fixes PyPI UI bundling flow (#12)
* fixes build env bug (#13)
* fixes ci issue (#14)
* readme improvements and readthedocs ci (#22)
* adds language support and interface improvements (#23)
* GIS location capture flow, dependency guidance, and test fixes (#25)
* Gis (#27)
* adds Country-specific compliance plugin (FCC, CEPT, R1 band plans) (#28)

* TTS/ASR provider plugin, Hugging Face Inference, local options, .gitignore cleanup (#31)

* SMS/WhatsApp relay, emergency approval, contact preferences & testing improvements (#32)

* Patch (#33)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Twilio/SMS and WhatsApp integrations are only partially wired (config mismatch + missing outbound channel handlers)

1 participant