Based on deep competitive research, codebase audit, and production architecture analysis.
- Require
GATEWAY_AUTH_TOKENin production — currently auto-bypasses auth if unset (auth.ts:36-38) - Fix hard-coded JWT secret —
"karna-cloud-dev-secret-change-me"inmiddleware/auth.ts:27 - Add startup env validation — fail fast if
ANTHROPIC_API_KEY,JWT_SECRET,GATEWAY_AUTH_TOKENmissing - CORS lockdown — default is
"*", needs allowlist per environment - Add security headers — X-Frame-Options, X-Content-Type-Options, CSP, HSTS
- WebSocket origin validation — no origin check on upgrade requests
- Per-message rate limiting on WebSocket connections
- Add
process.on('unhandledRejection')andprocess.on('uncaughtException')handlers
- Create
docker-compose.ymlwith: gateway, web dashboard, PostgreSQL+pgvector, Redis - Create
Dockerfilefor gateway and web - Add health checks for all services
- Document one-command self-hosting:
docker compose up
- Verify all 13 Supabase migrations run cleanly
- Add missing columns:
memories.embedding vector(1536), API key expiration/rotation fields - Add audit logs table for compliance
- Create
seed.sqlfor default agent configuration
- Add Zod-based startup validation for all env vars
- Validate secret strength (JWT secret >= 32 chars, not default value)
- Validate database connectivity at startup
- Add
NODE_ENVawareness (dev/staging/production behaviors)
- Document ingestion endpoint (PDF, Markdown, HTML, DOCX)
- Recursive character chunking (400 tokens, 50-token overlap)
- Embedding generation via OpenAI
text-embedding-3-small - Store chunks in pgvector with metadata (source, section, type)
- Hybrid search: pgvector similarity + PostgreSQL
tsvectorBM25 - Reciprocal Rank Fusion for combining search results
- Cross-encoder reranking (top-20 → top-5)
- New tool:
knowledge_searchfor agent to query document store
- Working memory: In-process per-request (current conversation context)
- Short-term memory: PostgreSQL session-scoped (hours/days)
- Long-term memory: PostgreSQL + pgvector persistent (user prefs, facts, entities)
- Memory importance scoring:
recency_decay * access_frequency * relevance - Rolling summarization when history exceeds token budget
- Memory garbage collection for low-importance entries
- Tag memories as episodic/semantic/procedural with different retention
- Supervisor/Worker pattern for complex tasks
- Agent definitions as config:
{ name, systemPrompt, model, tools[], handoffTargets[] } - Explicit handoff protocol with
HandoffPayloadschema - Maximum handoff depth (5) to prevent loops
- Router agent for channel dispatch to specialized agents
- Agent registry in gateway for managing multiple agent instances
- Deploy Langfuse via Docker Compose alongside Karna
- Instrument agent runtime with Langfuse/OpenTelemetry SDK
- Track: latency (p50/p95/p99), token usage, cost, tool success rates
- Trace visualization for agent loops (context → model → tools → response)
- Alerts: p99 > 10s, error rate > 5%, loop detection > 10 steps
- Dashboard for conversation quality metrics
- Backpressure handling: monitor
ws.bufferedAmount, pause on high watermark - Redis-backed replay buffer for reconnection recovery
- Sequence numbers on stream chunks for gap detection
- Periodic checkpoint events with accumulated text
- Complete the MCP server in
gateway/src/mcp/ - Expose Karna's tools via MCP for external consumption
- MCP client for connecting to external MCP servers (databases, APIs)
- Registry of popular MCP servers users can connect with one click
- Cron-based task scheduling (daily briefings, periodic checks)
- Event-triggered workflows (webhook → skill chain)
- Background agent execution with notification on completion
- Branch conversations to explore alternatives
- Fork a conversation at any point
- Compare outcomes of different branches
- Multiple users can join a session via invite link
- Redis pub/sub for cross-instance event fan-out
- Per-session event sourcing for full audit trail
- Shared context with per-user permissions
- Publish skills to a community registry
- Install/uninstall skills via CLI or web dashboard
- Skill versioning and dependency management
- Rating and review system
- WhatsApp: Migrate from Baileys (ToS violation) to official WhatsApp Business API
- Telegram: Add media handling, group chat support, webhook mode
- Discord: Add embed formatting, thread support, permission checks
- Slack: Add Block Kit formatting, file handling, workspace context
- SMS: Complete Twilio webhook handler, add DLR tracking, signature validation
- iMessage: Implement or remove (macOS-only, limited API)
- News Digest: Wire
fetchNewsForTopic()to web_search tool (currently returns empty) - All stubs: Audit and implement or remove: daily-briefing, travel-planner, meeting-prep, expense-tracker, health-tracker, smart-home
- Add skill testing framework
- Verify STT (Whisper) integration works end-to-end
- Verify TTS (ElevenLabs) integration works end-to-end
- Add voice activity detection for mobile
- Stream audio for real-time voice conversations
- Docker containers + seccomp profiles for shell/code execution
- Capability-based permission system per tool
- Resource limits: CPU (1 core), memory (512MB), time (30s), network (allowlist)
- Separate trust tiers: API calls (in-process) vs code exec (container)
- Helm chart for Karna deployment
- HPA for gateway (scale on connection count)
- Agent worker scaling via Redis/NATS job queue
- Sticky sessions for WebSocket on Ingress
- GitHub Actions: lint, typecheck, test, build on every PR
- Container image builds and push to GHCR
- Automated Supabase migration runs
- E2E test suite with Playwright
- Redis caching for session data and hot queries
- Connection pooling for database
- Prompt caching (Anthropic cache_control)
- Batch embedding generation for bulk document ingestion
| Decision | Choice | Rationale |
|---|---|---|
| Vector DB | pgvector (start) → Qdrant (scale) | Single-DB simplicity; proven to 10-20M vectors |
| Observability | Langfuse (self-hosted) | MIT, OTEL-native, Docker Compose deploy |
| Memory | 3-tier in PostgreSQL | ACID, hybrid search, row-level security |
| RAG | Hybrid BM25+vector with reranker | Consensus best practice |
| Multi-agent | Supervisor/Worker + Router | Covers 90% of use cases |
| Tool sandbox | Docker+seccomp → Firecracker (scale) | Defense-in-depth, zero-trust |
| Deployment | Docker Compose → K8s | Fastest to production, clear scale path |
| Streaming | WebSocket + backpressure + Redis replay | Handles slow clients, reconnection |