Use this runbook for deploying and operating Slack integration for Kodiai in #kodiai.
For the separate inbound webhook-to-Slack relay surface, use Slack Webhook Relay. That feature is service-level runtime config (SLACK_WEBHOOK_RELAY_SOURCES), not .kodiai.yml.
Primary goal: keep Slack behavior deterministic and thread-only while giving responders a fast path from symptom to root cause.
- Slack v1 allows only addressed traffic in the configured Kodiai channel and replies in-thread.
- Top-level bootstrap requires an explicit mention of the bot user (
<@SLACK_BOT_USER_ID>). - In-thread follow-up is allowed only after session bootstrap for that channel+thread.
- Service must acknowledge Slack events immediately; async processing happens after HTTP 200.
Run this sequence on every Slack rollout.
- Confirm runtime config is present (see environment table below).
- Confirm bot token scopes include at least
chat:writeandreactions:write. - Deploy using normal Azure flow in
deployment.md. - Validate health endpoints return success.
- Run mandatory Slack verification commands.
Before deployment, confirm:
SLACK_SIGNING_SECRET,SLACK_BOT_TOKEN,SLACK_BOT_USER_ID, andSLACK_KODIAI_CHANNEL_IDare set for the target environment.SLACK_ASSISTANT_MODELis set explicitly for production (or accept default insrc/config.ts).- Startup logs do not include missing-scope guidance from Slack preflight (
auth.test).
These checks are release gates after deploy and after any Slack incident fix:
bun run verify:phase81:smoke
bun run verify:phase81:regressionExpected result: all SLK81-SMOKE-* and SLK81-REG-* checks pass with final verdict PASS.
Map operator commands to machine-checkable IDs and immediate triage actions:
| Command | Check IDs | What it verifies | First troubleshooting step |
|---|---|---|---|
bun run verify:phase81:smoke |
SLK81-SMOKE-01..SLK81-SMOKE-04 |
Write-intent routing, ambiguous fallback quick actions, high-impact confirmation gating, and success/refusal output shape | Re-run failing scenario in src/slack/assistant-handler.test.ts and compare message contract text |
bun run verify:phase81:regression |
SLK81-REG-INTENT-01, SLK81-REG-HANDLER-01, SLK81-REG-CONFIRM-01 |
Pinned local suites for write-intent scoring, write-mode handler contracts, and confirmation store semantics | Run the failing suite directly from the gate output and inspect first failing assertion |
If any Slack verification gate fails in production:
- Roll back to the previous healthy Container App revision.
- Keep Slack changes blocked until smoke and regression are both green.
- Attach failing check IDs and log evidence to the incident ticket.
Use this table as source of truth for Slack operation-specific config.
| Variable | Required | Source of truth | Failure symptoms if wrong/missing |
|---|---|---|---|
SLACK_SIGNING_SECRET |
Yes | Slack App -> Basic Information -> App Credentials | Ingress returns 401; logs show verification failed in src/routes/slack-events.ts |
SLACK_BOT_TOKEN |
Yes | Slack App -> OAuth & Permissions -> Bot User OAuth Token | No thread replies; reaction add/remove failures; Slack API auth errors |
SLACK_BOT_USER_ID |
Yes | Slack App user ID for installed bot | Mentions ignored as missing bootstrap mention in src/slack/safety-rails.ts |
SLACK_KODIAI_CHANNEL_ID |
Yes | Slack channel ID for #kodiai in target workspace |
All events ignored as outside channel in safety rails |
SLACK_ASSISTANT_MODEL |
Recommended | Runtime env config (src/config.ts) |
Unexpected model behavior or fallback to default model |
LOG_LEVEL |
Recommended | Runtime env config (src/config.ts) |
Missing diagnostics during triage |
BOT_ALLOW_LIST |
Optional | Runtime env config (src/config.ts) |
Bot filtering may reject expected aliases in non-Slack flows |
Related baseline runtime vars (GITHUB_APP_ID, GITHUB_PRIVATE_KEY, GITHUB_WEBHOOK_SECRET, CLAUDE_CODE_OAUTH_TOKEN) remain required per deployment.md.
Start with the delivery evidence model from docs/runbooks/xbmc-ops.md: capture timestamp, message URL, and correlated app logs.
Symptom:
- Slack event requests return HTTP 401.
- Logs show
Rejected Slack event: verification failed.
Checks:
# Local validator sanity (replace placeholders)
curl -i -X POST http://localhost:3000/slack/events \
-H "x-slack-request-timestamp: <ts>" \
-H "x-slack-signature: v0=<sig>" \
-H "Content-Type: application/json" \
--data-binary @tmp/slack-event.jsonCode pointers:
src/routes/slack-events.ts(raw body handling + 401 path)src/slack/verify.ts(signature verification)
Symptom:
- No assistant reply and log includes
Slack event ignored by v1 safety rails.
Checks:
- Inspect
reasonfield in logs (outside_kodiai_channel,missing_bootstrap_mention,thread_follow_up_out_of_scope, etc.). - Verify channel ID and mention token configuration values.
Code pointers:
src/slack/safety-rails.ts(decision reasons)src/routes/slack-events.ts(rail invocation and ignore logging)
Symptom:
- Logs show addressed event accepted, but no Slack thread message appears.
Checks:
- Confirm async path emitted
Slack addressed event accepted for async processing. - Check assistant execution and publish path logs for failures.
- Validate repo context message was not answered with clarifying question unexpectedly.
Code pointers:
src/routes/slack-events.ts(async callback and accepted/failed log lines)src/slack/assistant-handler.ts(workspace execution, publishInThread, clarification path)
Symptom:
- Assistant still replies, but hourglass reaction behavior is missing or inconsistent.
Checks:
- Confirm bot token has
reactions:write. - Review startup preflight logs for missing scope warnings.
- Verify add/remove handlers are wired in Slack client integration.
Code pointers:
src/slack/assistant-handler.ts(add/remove working reaction calls)src/slack/client.ts(Slack API reaction integration)
Symptom:
- Assistant repeatedly asks for owner/repo clarification.
Checks:
- Confirm message contains at most one explicit
owner/repooverride. - Confirm clarification text is deterministic and appears once per ambiguous input.
Code pointers:
src/slack/repo-context.ts(repo parsing and ambiguity detection)src/slack/assistant-handler.ts(clarification-required publish path)
Symptom:
verify:phase81:smokeorverify:phase81:regressionexits non-zero.
Checks:
- Capture the exact failing check IDs from CLI output (
SLK81-SMOKE-*/SLK81-REG-*). - Run the pinned suite or deterministic scenario named by the failing ID.
- Confirm no recent changes altered confirmation wording, quick-action commands, or final write output shape.
Code pointers:
scripts/phase81-slack-write-smoke.ts(deterministic scenario checks)scripts/phase81-slack-write-regression-gate.ts(pinned suite mapping)src/slack/assistant-handler.ts(write routing and publish contracts)src/slack/write-intent.ts(intent scoring and confirmation-required signals)
# Slack write-mode smoke
bun run verify:phase81:smoke
# Slack write-mode regression gate (release blocking)
bun run verify:phase81:regressionAlways capture full CLI output and failing check IDs when escalating incidents.