The operating system for autonomous organizations.
Your agents have amnesia. They can't talk to each other. You spend 80% of your time as a high-paid secretary, copy-pasting context between them. Osabio is the self-correcting knowledge graph that gives your agents shared memory, governed autonomy, and verifiable intent.
You are the integration layer between your AI agents.
You already use AI agents. Your coding agent writes your code. Your chat assistant helps you think through architecture. Your editor agent autocompletes. But none of them share context.
- Copy error logs from CI into your coding agent
- Relay architecture decisions from a chat into your codebase
- Manually track what was decided, what's blocked, what changed
- Re-explain project state every time you start a new session
- Context-switch between tools, losing continuity at every step
Every agent you add makes it worse. Osabio fixes this — not by replacing your agents, but by giving them shared memory.
Why a graph, not a message bus. Most platforms try "agent swarms" — agents messaging agents. That creates a game of telephone where instructions get distorted. The graph is the single source of truth.
Human Layer
→ Web Chat / Feed / Graph View / Learning Library / Policy Management
→ Skill Library / Agent Registry / Tool Registry / Terminal
Agent Layer
→ Architect / Strategist / Management / Design Partner / Observer
→ Sandbox Agents (Claude Code, Codex, Aider — isolated runtime)
Graph Layer
→ Projects / Decisions / Tasks / Observations / Features / Questions
→ Suggestions / Conversations / Commits / Intents / Learnings
→ Objectives / Behaviors / Traces / Policies / Skills / Agents
Auth Layer
→ OAuth 2.1 / RAR (RFC 9396) / DPoP (RFC 9449) / Better Auth IdP
→ Evidence-backed intent authorization / Graduated enforcement
Integration Layer
→ GitHub / Slack / Git Hooks / MCP Protocol / MCP Tool Registry / ERC-8004
| Approach | Agent swarms / message buses | Knowledge graph (Osabio) |
|---|---|---|
| Logic | Scripted workflows (if A, do B) | State-based graph (emergent logic) |
| Memory | Ephemeral, session-based | Persistent, pruned, versioned |
| Coordination | Agents message agents (telephone game) | Agents read/write to shared truth |
| Verification | Assumes API calls work | Continuous telemetry (reality grounding) |
| Autonomy | "Let it rip" (high risk of loops) | Authority scopes (risk-managed) |
| Over time | Performance degrades | System gets smarter via learnings |
| Security | Sandbox isolation (the "box") | Governance graph + sandbox + evidence-backed intents |
| Auditing | Log-based (text dumps) | Graph-based (hierarchical traces, machine-readable) |
Each agent has a role, a domain, and authority scopes. They coordinate through the knowledge graph — not through you.
- Architect — Technical decisions, system design, architecture constraints. Checks implementations against what was decided. Resolves conflicts between competing approaches.
- Strategist — Market positioning, pricing, GTM, competitive response. Challenges product decisions against business viability.
- Management — Task tracking, priority management, execution velocity. Flags blocked work, stale decisions, and resource conflicts.
- Sandbox Agents — Your coding tools (Claude Code, Codex, Aider) running in isolated sandboxed environments with their own runtime, proxy tokens, and MCP connections. Created and managed through the Agent Registry UI with configurable authority scopes (11 actions, each auto/propose/blocked) and assigned skills for domain expertise. Agents are assigned to tasks from the orchestrator and execute within intent-gated governance — every tool call goes through scope evaluation before execution.
- Design Partner — Brainstorms product ideas, asks probing questions, identifies gaps. Shapes vague ideas into structured projects, features, and decisions.
- Observer — Continuously scans the graph for contradictions between decisions, stale tasks, status drift, and cross-project conflicts. Findings are verified through LLM reasoning pipelines with confidence scoring and evidence tracking. A peer review layer cross-validates observations to prevent false positives. Synthesizes recurring patterns into actionable suggestions. Proposes learnings from root cause analysis so the system self-corrects.
No agent messages another agent. They write structured signals to the knowledge graph. The graph makes it visible to the right agent at the right time.
- Sandbox agent notices a contradiction. While implementing rate limiting, a sandbox agent detects that
src/billing/api.tsuses REST — but the graph has a confirmed decision to standardize on tRPC. It logs an observation via its intent-gated MCP connection. - Architect agent sees it on next context load. The Architect checks the observation against constraints. Confirms the contradiction is real. Generates a suggestion: "Migrate billing API to tRPC, or revisit the standardization decision."
- Suggestion surfaces in your feed. You see the suggestion with full provenance — the observation, the contradicted decision, the Architect's reasoning. You accept it. A migration task is created with one click.
- Next sandbox agent picks up the task. You assign the migration task to a sandbox agent from the orchestrator. It loads the relevant skills, decomposes into subtasks, and executes — each tool call authorized by evidence-backed intents. Status rolls up automatically. No human copied anything between tabs.
- Decisions — Every decision is tracked — who made it, why, and what alternatives were considered. Agents propose. Humans confirm.
- Observations — Agents surface contradictions, gaps, patterns, and risks as they work. Observations accumulate and compound into actionable suggestions.
- Suggestions — Agents tell you what you should be thinking about. Accept a suggestion and it becomes a task, decision, or feature — with full trace back to the evidence.
- Projects, Features & Tasks — Work breaks down hierarchically. Agents decompose tasks at runtime. Status rolls up automatically.
- Questions — When an agent doesn't know, it asks instead of guessing. You answer. The answer becomes a decision in the graph.
- Conversations — Every chat produces structured knowledge. Conversations group by project automatically.
- Commits — Code is linked to the decisions and tasks it implements. Contradictions are caught before they land.
- Intents — Every agent action starts as an intent — a structured request in the graph. Intents require evidence-backed authorization: agents must provide verifiable graph references (
evidence_refs) that are validated through a deterministic pipeline (existence, workspace scope, temporal ordering, status liveness, authorship independence) before LLM evaluation. Graduated enforcement phases workspaces from bootstrap exemption through soft enforcement (missing evidence elevates risk score) to hard enforcement (insufficient evidence rejects the intent). Workspace admins configure per-action evidence thresholds via policy rules. - Authority Scopes — Control what each agent can do without asking. 11 configurable actions per agent, each set to auto-approve, propose-for-review, or blocked. Start restrictive. Expand trust over time.
- Learnings — Behavioral rules injected into agent prompts at runtime via JIT loading with token budgets. Learnings follow a lifecycle (
proposed→active→deactivated) and can be created by humans, suggested by agents, or proposed by the Observer from root cause analysis. Three-layer collision detection prevents duplicates. Pattern detection identifies recurring issues and suggests learnings automatically. The Learning Library UI lets you browse, filter, approve, edit, and deactivate learnings across all agents. - Objectives — Strategic goals that give agent work direction. Objectives link to projects, features, and tasks — providing alignment context for every intent. Progress is computed by graph-traversing linked intents. The Observer audits for orphaned decisions and stale objectives via coherence scans.
- Behaviors — Measurable behavioral expectations attached to objectives. Each behavior has a scoring function (LLM-evaluated or definition-matched), trend analysis (drift, improvement, flat-line detection), and a bridge to the learning system that proposes corrective learnings when scores decline. Behaviors feed into policy enforcement — the Authorizer checks behavior scores before granting intent authorization. Workspace admins define behavior definitions with configurable scoring criteria, thresholds, and remediation guidance.
- Identity — One person across all tools. Your Slack, GitHub, and terminal sessions all resolve to the same identity.
- Skills — Graph-native behavioral expertise documents that give agents proactive domain knowledge from their first session. Skills sit between Tools (functional capabilities) and Learnings (reactive corrections) — they encode governed, versionable expertise like "supply chain risk assessment" or "compliance audit procedures." Skills follow a lifecycle (
draft→active→deprecated), link to agents viapossessesedges, and can be governed by policies viagoverns_skillrelations. The Skill Library UI lets you create, browse, activate, and assign skills to agents. - Agents — First-class graph entities representing sandboxed coding agents (Claude Code, Codex, Aider). Each agent has a runtime configuration, sandbox settings, authority scopes, assigned skills, and available tools. Created through a 3-step wizard (Config → Skills → Tools) in the Agent Registry UI. Agents are assigned to tasks and execute in isolated sessions with intent-gated MCP governance.
- MCP Tool Registry — Centralized discovery and management of MCP tools across external servers. Includes OAuth 2.1 discovery (RFC 9728), dynamic client registration, PKCE authorization, and automatic token refresh for authenticated MCP servers. Credentials are encrypted with AES-256-GCM. Workspace admins control tool access through grants and governance policies. The Tool Registry UI provides tabs for providers, accounts, tools, grants, MCP servers, and a discovery review panel for selective import.
- Agent Sessions — Every session is remembered. The next agent knows what the last one did.
- Traces — Every agent execution is a graph-native call tree. Subagent spawns, tool calls, and decisions form a hierarchical trace you can traverse, query, and audit. Forensic debugging is a graph query, not grep.
- Policies — Deterministic governance rules stored as graph nodes, not prompt text. Each policy carries typed rules, scopes, and approval requirements. Policies follow a lifecycle (
draft→active→deprecated) with version chains — create a new version and the previous one is superseded atomically. The Policy Management UI lets you create, activate, deprecate, version, diff, and trace policies. The Authorizer evaluates intents against the policy graph before minting tokens — no prompt rewriting needed.
Autonomous systems don't fail from lack of intelligence. They fail from drift — slow divergence between what the system believes and what's actually true.
- Context Drift — Decisions made in v1.0 become poison for v2.0. Osabio uses temporal decay — nodes that aren't referenced lose weight over time. The Observer continuously scans for contradictions between decisions, verifies them with LLM reasoning, and synthesizes patterns into learnings that prevent repeat mistakes.
- Authority Drift — Too autonomous = dangerous. Too locked down = a dashboard. Osabio uses tiered authority scopes — from zero-human atomic actions to multi-model consensus for high-stakes moves. Agents operate within risk budgets, not permission checkboxes.
- Reality Drift — If the Osabio only reads its own graph, it's a delusion engine. The Observer performs truth audits — verifying claims against actual state through LLM verification pipelines with confidence scoring. Peer review cross-validates findings. When reality diverges from the graph, the system triggers a desync alert.
Most autonomous platforms are black boxes. Osabio is a signed logic trace. Every decision, every dollar, every line of code has a provenance chain back to the intent that authorized it.
- Governance telemetry — Every decision is a node with a UUID, author, timestamp, and reasoning. Auditors can query the graph directly.
- Evidence-backed intent chains — When an agent spends money or merges code, the graph records which intent authorized it, which evidence justified it, which authority scope permitted it, and which human approved it. Every intent carries verifiable graph references that are validated before authorization — no action without provenance.
- Hierarchical traces — Agent executions are graph-native call trees. A subagent spawn becomes a root trace; each tool call, message, and decision is a child node. Traverse the full execution path with a graph query — from intent to final action.
- Policy-as-graph — Governance rules are versioned graph nodes with typed rules, scopes, and approval requirements. Policies follow a lifecycle with version chains — create, activate, deprecate, and diff through a dedicated management UI. The Authorizer evaluates intents against the policy graph before minting tokens — deterministic, auditable, and updateable without touching a single prompt.
- The "Judge" pattern — High-stakes actions go through an Authorizer Agent that validates intents against policy constraints before minting scoped tokens. The worker never sees master keys.
The knowledge graph that coordinates your agents shouldn't be a black box you rent. It should be infrastructure you own, inspect, and extend.
- Full source access — Graph engine, MCP server, agent prompts, extraction pipeline. Every line.
- No vendor lock-in — Your data lives in your SurrealDB instance. Export anytime. Migrate anytime.
- Extend everything — Custom agent types, observation categories, feed cards, MCP tools.
- Community-driven — Agent prompts, authority scope templates, and integrations contributed by users.
| Layer | Technology |
|---|---|
| Graph | SurrealDB (SurrealKV storage engine) |
| Backend | Bun (Bun.serve) · TypeScript |
| Frontend | React · Tiptap · Reagraph |
| Auth | Better Auth · OAuth 2.1 · RAR · DPoP |
| LLM | Provider-agnostic (OpenRouter · Ollama · BYO keys) |
| Agents | MCP Server · MCP Tool Registry · Sandbox Runtime · Git Hooks |
# One-time workspace setup
$ osabio init
# Opens browser → authenticate → approve scopes
# ✓ Connected to workspace
# Start a task-scoped session
$ osabio start task:implement-rate-limiting
# Context: 3 decisions, 2 constraints, 1 open question
# Task status: todo → in_progress
# Or just open your MCP-compatible coding agent
$ codex
# SessionStart → project context loaded
# 4 decisions · 2 tasks · 1 recent observation- Bun
>=1.3 - Docker (for SurrealDB)
- Either:
- OpenRouter credentials, or
- Ollama runtime + local models
bun installdocker compose up -d surrealdb surrealdb-initOPENROUTER_API_KEY=your_openrouter_key
CHAT_AGENT_MODEL=<chat-model-id>
EXTRACTION_MODEL=<extraction-model-id>
ANALYTICS_MODEL=<analytics-model-id>
PM_AGENT_MODEL=<pm-model-id>
OBSERVER_MODEL=<observer-model-id>
BEHAVIOR_SCORER_MODEL=<behavior-scorer-model-id>
EMBEDDING_MODEL=<embedding-model-id>
EMBEDDING_DIMENSION=1536
EXTRACTION_STORE_THRESHOLD=0.6
EXTRACTION_DISPLAY_THRESHOLD=0.85
SURREAL_URL=ws://127.0.0.1:8000/rpc
SURREAL_USERNAME=root
SURREAL_PASSWORD=root
SURREAL_NAMESPACE=brain
SURREAL_DATABASE=app
PORT=3000INFERENCE_PROVIDER=ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
CHAT_AGENT_MODEL=<ollama-chat-model>
EXTRACTION_MODEL=<ollama-extraction-model>
ANALYTICS_MODEL=<ollama-analytics-model>
PM_AGENT_MODEL=<ollama-pm-model>
OBSERVER_MODEL=<ollama-observer-model>
BEHAVIOR_SCORER_MODEL=<ollama-behavior-scorer-model>
EMBEDDING_MODEL=<ollama-embedding-model>
EMBEDDING_DIMENSION=1536
EXTRACTION_STORE_THRESHOLD=0.6
EXTRACTION_DISPLAY_THRESHOLD=0.85
SURREAL_URL=ws://127.0.0.1:8000/rpc
SURREAL_USERNAME=root
SURREAL_PASSWORD=root
SURREAL_NAMESPACE=brain
SURREAL_DATABASE=app
PORT=3000bun migratebun run devOpen http://localhost:3000.
POST /api/workspacescreate workspace + bootstrap conversationPOST /api/chat/messagessend message (supports file attachments)GET /api/chat/stream/:messageIdstream eventsGET /api/workspaces/:workspaceId/feedgovernance feedGET /api/graph/:workspaceIdgraph viewsGET /api/entities/searchfull-text entity searchPOST /api/mcp/:workspaceId/contextintent-based MCP context resolutionPOST /api/workspaces/:workspaceId/observer/scantrigger Observer graph scanGET /api/workspaces/:workspaceId/observer/observationslist Observer findingsGET /api/workspaces/:workspaceId/learningslist learnings (filterable by status, type, agent)POST /api/workspaces/:workspaceId/learningscreate a learningPUT /api/workspaces/:workspaceId/learnings/:idedit a learningPOST /api/workspaces/:workspaceId/learnings/:id/approveapprove a proposed learningPOST /api/workspaces/:workspaceId/learnings/:id/dismissdismiss a proposed learningPOST /api/workspaces/:workspaceId/learnings/:id/deactivatedeactivate an active learningGET /api/workspaces/:workspaceId/policieslist policies (filterable by status)POST /api/workspaces/:workspaceId/policiescreate a draft policyGET /api/workspaces/:workspaceId/policies/:idpolicy detail with edges and version chainPOST /api/workspaces/:workspaceId/policies/:id/activateactivate a draft policyPOST /api/workspaces/:workspaceId/policies/:id/deprecatedeprecate an active policyPOST /api/workspaces/:workspaceId/policies/:id/versionscreate a new version (supersede chain)GET /api/workspaces/:workspaceId/policies/:id/versionsversion historyPOST /api/workspaces/:workspaceId/objectivescreate an objectiveGET /api/workspaces/:workspaceId/objectiveslist objectivesGET /api/workspaces/:workspaceId/objectives/:idget objective detailPUT /api/workspaces/:workspaceId/objectives/:idupdate an objectiveDELETE /api/workspaces/:workspaceId/objectives/:iddelete an objectiveGET /api/workspaces/:workspaceId/objectives/:id/progressget objective progressPOST /api/workspaces/:workspaceId/behaviorscreate a behaviorGET /api/workspaces/:workspaceId/behaviorslist behaviorsPUT /api/workspaces/:workspaceId/behaviors/:idupdate a behaviorDELETE /api/workspaces/:workspaceId/behaviors/:iddelete a behaviorPOST /api/workspaces/:workspaceId/behaviors/scorescore a behaviorPOST /api/workspaces/:workspaceId/behaviors/definitionscreate a behavior definitionGET /api/workspaces/:workspaceId/behaviors/definitionslist behavior definitionsPUT /api/workspaces/:workspaceId/behaviors/definitions/:idupdate a behavior definitionDELETE /api/workspaces/:workspaceId/behaviors/definitions/:iddelete a behavior definitionPOST /api/workspaces/:workspaceId/agentscreate an agentGET /api/workspaces/:workspaceId/agentslist agentsGET /api/workspaces/:workspaceId/agents/:idget agent detailPUT /api/workspaces/:workspaceId/agents/:idupdate an agentDELETE /api/workspaces/:workspaceId/agents/:iddelete an agentGET /api/workspaces/:workspaceId/skillslist skillsPOST /api/workspaces/:workspaceId/skillscreate a skillGET /api/workspaces/:workspaceId/skills/:idget skill detailPUT /api/workspaces/:workspaceId/skills/:idupdate a skillPOST /api/workspaces/:workspaceId/skills/:id/activateactivate a draft skillPOST /api/workspaces/:workspaceId/skills/:id/deprecatedeprecate an active skillGET /api/workspaces/:workspaceId/tool-registry/providerslist credential providersPOST /api/workspaces/:workspaceId/tool-registry/providerscreate a credential providerGET /api/workspaces/:workspaceId/tool-registry/toolslist registered MCP toolsPOST /api/workspaces/:workspaceId/tool-registry/grantscreate a tool access grantGET /api/workspaces/:workspaceId/tool-registry/serverslist MCP serversPOST /api/workspaces/:workspaceId/tool-registry/serversregister an MCP serverPOST /api/workspaces/:workspaceId/tool-registry/servers/:id/discoverdiscover tools from MCP server
Build CLI:
bun run build:cli
# outputs ./osabioInitialize repo integration:
OSABIO_SERVER_URL=http://localhost:3000 \
OSABIO_WORKSPACE_ID=<workspace-id> \
osabio initosabio init sets up:
~/.osabio/config.jsonauth entry.mcp.jsonserver registration.claude/settings.jsonhooksCLAUDE.mdintegration block- Osabio slash commands and git hooks
Run MCP directly:
osabio mcpbun run dev
bun run start
bun run typecheck
bun test tests/unit/
bun test --env-file=.env tests/acceptance/
bun run eval
bun run eval:watch
bun migrateapp/
server.ts # Bun entrypoint
src/client/ # chat/feed/graph/learning-library/skill-library/agent-registry/tool-registry UI
src/server/
observer/ # graph scanning, LLM verification, peer review, learning diagnosis
agents/ # agent CRUD, sandbox adapter, orchestrator agents
learning/ # learning CRUD, collision detection, pattern detection
policy/ # policy CRUD, validation, versioning, lifecycle
objective/ # objective CRUD, alignment evaluator, progress tracking
behavior/ # behavior telemetry, scorer, definitions, trend analysis
intent/ # intent creation, evidence verification, graduated enforcement
skill/ # skill CRUD, lifecycle, agent-skill assignment
chat/ # chat agent, tools, context
extraction/ # extraction pipeline
orchestrator/ # sandbox session lifecycle, event bridge, session store
mcp/ # agent MCP route, scope engine, intent-gated tool calls
tools/ # shared AI SDK tool definitions (chat, PM, observer, proxy)
tool-registry/ # MCP tool discovery, credential brokerage, governance
proxy/ # proxy compliance, sessions, spend, traces
cli/ # osabio CLI + MCP server
schema/
surreal-schema.surql # base schema
migrations/ # versioned migrations (84+)
tests/
unit/ # deterministic unit tests
acceptance/ # acceptance tests (in-process server + isolated DB)
evals/ # model eval suites + scorers (incl. observer evals)
Early-stage and actively developed.