Osabio

The operating system for autonomous organizations.

Your agents have amnesia. They can't talk to each other. You spend 80% of your time as a high-paid secretary, copy-pasting context between them. Osabio is the self-correcting knowledge graph that gives your agents shared memory, governed autonomy, and verifiable intent.

The Problem

You are the integration layer between your AI agents.

You already use AI agents. Your coding agent writes your code. Your chat assistant helps you think through architecture. Your editor agent autocompletes. But none of them share context.

Copy error logs from CI into your coding agent
Relay architecture decisions from a chat into your codebase
Manually track what was decided, what's blocked, what changed
Re-explain project state every time you start a new session
Context-switch between tools, losing continuity at every step

Every agent you add makes it worse. Osabio fixes this — not by replacing your agents, but by giving them shared memory.

Architecture

Why a graph, not a message bus. Most platforms try "agent swarms" — agents messaging agents. That creates a game of telephone where instructions get distorted. The graph is the single source of truth.

Human Layer
  → Web Chat / Feed / Graph View / Learning Library / Policy Management
  → Skill Library / Agent Registry / Tool Registry / Terminal

Agent Layer
  → Architect / Strategist / Management / Design Partner / Observer
  → Sandbox Agents (Claude Code, Codex, Aider — isolated runtime)

Graph Layer
  → Projects / Decisions / Tasks / Observations / Features / Questions
  → Suggestions / Conversations / Commits / Intents / Learnings
  → Objectives / Behaviors / Traces / Policies / Skills / Agents

Auth Layer
  → OAuth 2.1 / RAR (RFC 9396) / DPoP (RFC 9449) / Better Auth IdP
  → Evidence-backed intent authorization / Graduated enforcement

Integration Layer
  → GitHub / Slack / Git Hooks / MCP Protocol / MCP Tool Registry / ERC-8004

Approach	Agent swarms / message buses	Knowledge graph (Osabio)
Logic	Scripted workflows (if A, do B)	State-based graph (emergent logic)
Memory	Ephemeral, session-based	Persistent, pruned, versioned
Coordination	Agents message agents (telephone game)	Agents read/write to shared truth
Verification	Assumes API calls work	Continuous telemetry (reality grounding)
Autonomy	"Let it rip" (high risk of loops)	Authority scopes (risk-managed)
Over time	Performance degrades	System gets smarter via learnings
Security	Sandbox isolation (the "box")	Governance graph + sandbox + evidence-backed intents
Auditing	Log-based (text dumps)	Graph-based (hierarchical traces, machine-readable)

Specialized Agents

Each agent has a role, a domain, and authority scopes. They coordinate through the knowledge graph — not through you.

Architect — Technical decisions, system design, architecture constraints. Checks implementations against what was decided. Resolves conflicts between competing approaches.
Strategist — Market positioning, pricing, GTM, competitive response. Challenges product decisions against business viability.
Management — Task tracking, priority management, execution velocity. Flags blocked work, stale decisions, and resource conflicts.
Sandbox Agents — Your coding tools (Claude Code, Codex, Aider) running in isolated sandboxed environments with their own runtime, proxy tokens, and MCP connections. Created and managed through the Agent Registry UI with configurable authority scopes (11 actions, each auto/propose/blocked) and assigned skills for domain expertise. Agents are assigned to tasks from the orchestrator and execute within intent-gated governance — every tool call goes through scope evaluation before execution.
Design Partner — Brainstorms product ideas, asks probing questions, identifies gaps. Shapes vague ideas into structured projects, features, and decisions.
Observer — Continuously scans the graph for contradictions between decisions, stale tasks, status drift, and cross-project conflicts. Findings are verified through LLM reasoning pipelines with confidence scoring and evidence tracking. A peer review layer cross-validates observations to prevent false positives. Synthesizes recurring patterns into actionable suggestions. Proposes learnings from root cause analysis so the system self-corrects.

How Coordination Works

No agent messages another agent. They write structured signals to the knowledge graph. The graph makes it visible to the right agent at the right time.

Sandbox agent notices a contradiction. While implementing rate limiting, a sandbox agent detects that src/billing/api.ts uses REST — but the graph has a confirmed decision to standardize on tRPC. It logs an observation via its intent-gated MCP connection.
Architect agent sees it on next context load. The Architect checks the observation against constraints. Confirms the contradiction is real. Generates a suggestion: "Migrate billing API to tRPC, or revisit the standardization decision."
Suggestion surfaces in your feed. You see the suggestion with full provenance — the observation, the contradicted decision, the Architect's reasoning. You accept it. A migration task is created with one click.
Next sandbox agent picks up the task. You assign the migration task to a sandbox agent from the orchestrator. It loads the relevant skills, decomposes into subtasks, and executes — each tool call authorized by evidence-backed intents. Status rolls up automatically. No human copied anything between tabs.

Key Concepts

Decisions — Every decision is tracked — who made it, why, and what alternatives were considered. Agents propose. Humans confirm.
Observations — Agents surface contradictions, gaps, patterns, and risks as they work. Observations accumulate and compound into actionable suggestions.
Suggestions — Agents tell you what you should be thinking about. Accept a suggestion and it becomes a task, decision, or feature — with full trace back to the evidence.
Projects, Features & Tasks — Work breaks down hierarchically. Agents decompose tasks at runtime. Status rolls up automatically.
Questions — When an agent doesn't know, it asks instead of guessing. You answer. The answer becomes a decision in the graph.
Conversations — Every chat produces structured knowledge. Conversations group by project automatically.
Commits — Code is linked to the decisions and tasks it implements. Contradictions are caught before they land.
Intents — Every agent action starts as an intent — a structured request in the graph. Intents require evidence-backed authorization: agents must provide verifiable graph references (evidence_refs) that are validated through a deterministic pipeline (existence, workspace scope, temporal ordering, status liveness, authorship independence) before LLM evaluation. Graduated enforcement phases workspaces from bootstrap exemption through soft enforcement (missing evidence elevates risk score) to hard enforcement (insufficient evidence rejects the intent). Workspace admins configure per-action evidence thresholds via policy rules.
Authority Scopes — Control what each agent can do without asking. 11 configurable actions per agent, each set to auto-approve, propose-for-review, or blocked. Start restrictive. Expand trust over time.
Learnings — Behavioral rules injected into agent prompts at runtime via JIT loading with token budgets. Learnings follow a lifecycle (proposed → active → deactivated) and can be created by humans, suggested by agents, or proposed by the Observer from root cause analysis. Three-layer collision detection prevents duplicates. Pattern detection identifies recurring issues and suggests learnings automatically. The Learning Library UI lets you browse, filter, approve, edit, and deactivate learnings across all agents.
Objectives — Strategic goals that give agent work direction. Objectives link to projects, features, and tasks — providing alignment context for every intent. Progress is computed by graph-traversing linked intents. The Observer audits for orphaned decisions and stale objectives via coherence scans.
Behaviors — Measurable behavioral expectations attached to objectives. Each behavior has a scoring function (LLM-evaluated or definition-matched), trend analysis (drift, improvement, flat-line detection), and a bridge to the learning system that proposes corrective learnings when scores decline. Behaviors feed into policy enforcement — the Authorizer checks behavior scores before granting intent authorization. Workspace admins define behavior definitions with configurable scoring criteria, thresholds, and remediation guidance.
Identity — One person across all tools. Your Slack, GitHub, and terminal sessions all resolve to the same identity.
Skills — Graph-native behavioral expertise documents that give agents proactive domain knowledge from their first session. Skills sit between Tools (functional capabilities) and Learnings (reactive corrections) — they encode governed, versionable expertise like "supply chain risk assessment" or "compliance audit procedures." Skills follow a lifecycle (draft → active → deprecated), link to agents via possesses edges, and can be governed by policies via governs_skill relations. The Skill Library UI lets you create, browse, activate, and assign skills to agents.
Agents — First-class graph entities representing sandboxed coding agents (Claude Code, Codex, Aider). Each agent has a runtime configuration, sandbox settings, authority scopes, assigned skills, and available tools. Created through a 3-step wizard (Config → Skills → Tools) in the Agent Registry UI. Agents are assigned to tasks and execute in isolated sessions with intent-gated MCP governance.
MCP Tool Registry — Centralized discovery and management of MCP tools across external servers. Includes OAuth 2.1 discovery (RFC 9728), dynamic client registration, PKCE authorization, and automatic token refresh for authenticated MCP servers. Credentials are encrypted with AES-256-GCM. Workspace admins control tool access through grants and governance policies. The Tool Registry UI provides tabs for providers, accounts, tools, grants, MCP servers, and a discovery review panel for selective import.
Agent Sessions — Every session is remembered. The next agent knows what the last one did.
Traces — Every agent execution is a graph-native call tree. Subagent spawns, tool calls, and decisions form a hierarchical trace you can traverse, query, and audit. Forensic debugging is a graph query, not grep.
Policies — Deterministic governance rules stored as graph nodes, not prompt text. Each policy carries typed rules, scopes, and approval requirements. Policies follow a lifecycle (draft → active → deprecated) with version chains — create a new version and the previous one is superseded atomically. The Policy Management UI lets you create, activate, deprecate, version, diff, and trace policies. The Authorizer evaluates intents against the policy graph before minting tokens — no prompt rewriting needed.

Reliability: Solving the Three Drifts

Autonomous systems don't fail from lack of intelligence. They fail from drift — slow divergence between what the system believes and what's actually true.

Context Drift — Decisions made in v1.0 become poison for v2.0. Osabio uses temporal decay — nodes that aren't referenced lose weight over time. The Observer continuously scans for contradictions between decisions, verifies them with LLM reasoning, and synthesizes patterns into learnings that prevent repeat mistakes.
Authority Drift — Too autonomous = dangerous. Too locked down = a dashboard. Osabio uses tiered authority scopes — from zero-human atomic actions to multi-model consensus for high-stakes moves. Agents operate within risk budgets, not permission checkboxes.
Reality Drift — If the Osabio only reads its own graph, it's a delusion engine. The Observer performs truth audits — verifying claims against actual state through LLM verification pipelines with confidence scoring. Peer review cross-validates findings. When reality diverges from the graph, the system triggers a desync alert.

Verifiable Autonomy

Most autonomous platforms are black boxes. Osabio is a signed logic trace. Every decision, every dollar, every line of code has a provenance chain back to the intent that authorized it.

Governance telemetry — Every decision is a node with a UUID, author, timestamp, and reasoning. Auditors can query the graph directly.
Evidence-backed intent chains — When an agent spends money or merges code, the graph records which intent authorized it, which evidence justified it, which authority scope permitted it, and which human approved it. Every intent carries verifiable graph references that are validated before authorization — no action without provenance.
Hierarchical traces — Agent executions are graph-native call trees. A subagent spawn becomes a root trace; each tool call, message, and decision is a child node. Traverse the full execution path with a graph query — from intent to final action.
Policy-as-graph — Governance rules are versioned graph nodes with typed rules, scopes, and approval requirements. Policies follow a lifecycle with version chains — create, activate, deprecate, and diff through a dedicated management UI. The Authorizer evaluates intents against the policy graph before minting tokens — deterministic, auditable, and updateable without touching a single prompt.
The "Judge" pattern — High-stakes actions go through an Authorizer Agent that validates intents against policy constraints before minting scoped tokens. The worker never sees master keys.

Open Source

The knowledge graph that coordinates your agents shouldn't be a black box you rent. It should be infrastructure you own, inspect, and extend.

Full source access — Graph engine, MCP server, agent prompts, extraction pipeline. Every line.
No vendor lock-in — Your data lives in your SurrealDB instance. Export anytime. Migrate anytime.
Extend everything — Custom agent types, observation categories, feed cards, MCP tools.
Community-driven — Agent prompts, authority scope templates, and integrations contributed by users.

Tech Stack

Layer	Technology
Graph	SurrealDB (SurrealKV storage engine)
Backend	Bun (`Bun.serve`) · TypeScript
Frontend	React · Tiptap · Reagraph
Auth	Better Auth · OAuth 2.1 · RAR · DPoP
LLM	Provider-agnostic (OpenRouter · Ollama · BYO keys)
Agents	MCP Server · MCP Tool Registry · Sandbox Runtime · Git Hooks

Connect in 60 Seconds

# One-time workspace setup
$ osabio init
# Opens browser → authenticate → approve scopes
# ✓ Connected to workspace

# Start a task-scoped session
$ osabio start task:implement-rate-limiting
# Context: 3 decisions, 2 constraints, 1 open question
# Task status: todo → in_progress

# Or just open your MCP-compatible coding agent
$ codex
# SessionStart → project context loaded
# 4 decisions · 2 tasks · 1 recent observation

Quickstart

1) Prerequisites

Bun >=1.3
Docker (for SurrealDB)
Either:
- OpenRouter credentials, or
- Ollama runtime + local models

2) Install dependencies

bun install

3) Start SurrealDB

docker compose up -d surrealdb surrealdb-init

4) Configure environment

OpenRouter profile

OPENROUTER_API_KEY=your_openrouter_key
CHAT_AGENT_MODEL=<chat-model-id>
EXTRACTION_MODEL=<extraction-model-id>
ANALYTICS_MODEL=<analytics-model-id>
PM_AGENT_MODEL=<pm-model-id>
OBSERVER_MODEL=<observer-model-id>
BEHAVIOR_SCORER_MODEL=<behavior-scorer-model-id>
EMBEDDING_MODEL=<embedding-model-id>
EMBEDDING_DIMENSION=1536
EXTRACTION_STORE_THRESHOLD=0.6
EXTRACTION_DISPLAY_THRESHOLD=0.85
SURREAL_URL=ws://127.0.0.1:8000/rpc
SURREAL_USERNAME=root
SURREAL_PASSWORD=root
SURREAL_NAMESPACE=brain
SURREAL_DATABASE=app
PORT=3000

Ollama profile

INFERENCE_PROVIDER=ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
CHAT_AGENT_MODEL=<ollama-chat-model>
EXTRACTION_MODEL=<ollama-extraction-model>
ANALYTICS_MODEL=<ollama-analytics-model>
PM_AGENT_MODEL=<ollama-pm-model>
OBSERVER_MODEL=<ollama-observer-model>
BEHAVIOR_SCORER_MODEL=<ollama-behavior-scorer-model>
EMBEDDING_MODEL=<ollama-embedding-model>
EMBEDDING_DIMENSION=1536
EXTRACTION_STORE_THRESHOLD=0.6
EXTRACTION_DISPLAY_THRESHOLD=0.85
SURREAL_URL=ws://127.0.0.1:8000/rpc
SURREAL_USERNAME=root
SURREAL_PASSWORD=root
SURREAL_NAMESPACE=brain
SURREAL_DATABASE=app
PORT=3000

5) Apply migrations

bun migrate

6) Run the app

bun run dev

Open http://localhost:3000.

API at a Glance

POST /api/workspaces create workspace + bootstrap conversation
POST /api/chat/messages send message (supports file attachments)
GET /api/chat/stream/:messageId stream events
GET /api/workspaces/:workspaceId/feed governance feed
GET /api/graph/:workspaceId graph views
GET /api/entities/search full-text entity search
POST /api/mcp/:workspaceId/context intent-based MCP context resolution
POST /api/workspaces/:workspaceId/observer/scan trigger Observer graph scan
GET /api/workspaces/:workspaceId/observer/observations list Observer findings
GET /api/workspaces/:workspaceId/learnings list learnings (filterable by status, type, agent)
POST /api/workspaces/:workspaceId/learnings create a learning
PUT /api/workspaces/:workspaceId/learnings/:id edit a learning
POST /api/workspaces/:workspaceId/learnings/:id/approve approve a proposed learning
POST /api/workspaces/:workspaceId/learnings/:id/dismiss dismiss a proposed learning
POST /api/workspaces/:workspaceId/learnings/:id/deactivate deactivate an active learning
GET /api/workspaces/:workspaceId/policies list policies (filterable by status)
POST /api/workspaces/:workspaceId/policies create a draft policy
GET /api/workspaces/:workspaceId/policies/:id policy detail with edges and version chain
POST /api/workspaces/:workspaceId/policies/:id/activate activate a draft policy
POST /api/workspaces/:workspaceId/policies/:id/deprecate deprecate an active policy
POST /api/workspaces/:workspaceId/policies/:id/versions create a new version (supersede chain)
GET /api/workspaces/:workspaceId/policies/:id/versions version history
POST /api/workspaces/:workspaceId/objectives create an objective
GET /api/workspaces/:workspaceId/objectives list objectives
GET /api/workspaces/:workspaceId/objectives/:id get objective detail
PUT /api/workspaces/:workspaceId/objectives/:id update an objective
DELETE /api/workspaces/:workspaceId/objectives/:id delete an objective
GET /api/workspaces/:workspaceId/objectives/:id/progress get objective progress
POST /api/workspaces/:workspaceId/behaviors create a behavior
GET /api/workspaces/:workspaceId/behaviors list behaviors
PUT /api/workspaces/:workspaceId/behaviors/:id update a behavior
DELETE /api/workspaces/:workspaceId/behaviors/:id delete a behavior
POST /api/workspaces/:workspaceId/behaviors/score score a behavior
POST /api/workspaces/:workspaceId/behaviors/definitions create a behavior definition
GET /api/workspaces/:workspaceId/behaviors/definitions list behavior definitions
PUT /api/workspaces/:workspaceId/behaviors/definitions/:id update a behavior definition
DELETE /api/workspaces/:workspaceId/behaviors/definitions/:id delete a behavior definition
POST /api/workspaces/:workspaceId/agents create an agent
GET /api/workspaces/:workspaceId/agents list agents
GET /api/workspaces/:workspaceId/agents/:id get agent detail
PUT /api/workspaces/:workspaceId/agents/:id update an agent
DELETE /api/workspaces/:workspaceId/agents/:id delete an agent
GET /api/workspaces/:workspaceId/skills list skills
POST /api/workspaces/:workspaceId/skills create a skill
GET /api/workspaces/:workspaceId/skills/:id get skill detail
PUT /api/workspaces/:workspaceId/skills/:id update a skill
POST /api/workspaces/:workspaceId/skills/:id/activate activate a draft skill
POST /api/workspaces/:workspaceId/skills/:id/deprecate deprecate an active skill
GET /api/workspaces/:workspaceId/tool-registry/providers list credential providers
POST /api/workspaces/:workspaceId/tool-registry/providers create a credential provider
GET /api/workspaces/:workspaceId/tool-registry/tools list registered MCP tools
POST /api/workspaces/:workspaceId/tool-registry/grants create a tool access grant
GET /api/workspaces/:workspaceId/tool-registry/servers list MCP servers
POST /api/workspaces/:workspaceId/tool-registry/servers register an MCP server
POST /api/workspaces/:workspaceId/tool-registry/servers/:id/discover discover tools from MCP server

MCP + CLI

Build CLI:

bun run build:cli
# outputs ./osabio

Initialize repo integration:

OSABIO_SERVER_URL=http://localhost:3000 \
OSABIO_WORKSPACE_ID=<workspace-id> \
osabio init

osabio init sets up:

~/.osabio/config.json auth entry
.mcp.json server registration
.claude/settings.json hooks
CLAUDE.md integration block
Osabio slash commands and git hooks

Run MCP directly:

osabio mcp

Useful Scripts

bun run dev
bun run start
bun run typecheck
bun test tests/unit/
bun test --env-file=.env tests/acceptance/
bun run eval
bun run eval:watch
bun migrate

Repository Map

app/
  server.ts                     # Bun entrypoint
  src/client/                   # chat/feed/graph/learning-library/skill-library/agent-registry/tool-registry UI
  src/server/
    observer/                   # graph scanning, LLM verification, peer review, learning diagnosis
    agents/                     # agent CRUD, sandbox adapter, orchestrator agents
    learning/                   # learning CRUD, collision detection, pattern detection
    policy/                     # policy CRUD, validation, versioning, lifecycle
    objective/                  # objective CRUD, alignment evaluator, progress tracking
    behavior/                   # behavior telemetry, scorer, definitions, trend analysis
    intent/                     # intent creation, evidence verification, graduated enforcement
    skill/                      # skill CRUD, lifecycle, agent-skill assignment
    chat/                       # chat agent, tools, context
    extraction/                 # extraction pipeline
    orchestrator/               # sandbox session lifecycle, event bridge, session store
    mcp/                        # agent MCP route, scope engine, intent-gated tool calls
    tools/                      # shared AI SDK tool definitions (chat, PM, observer, proxy)
    tool-registry/              # MCP tool discovery, credential brokerage, governance
    proxy/                      # proxy compliance, sessions, spend, traces
cli/                            # osabio CLI + MCP server
schema/
  surreal-schema.surql          # base schema
  migrations/                   # versioned migrations (84+)
tests/
  unit/                         # deterministic unit tests
  acceptance/                   # acceptance tests (in-process server + isolated DB)
evals/                          # model eval suites + scorers (incl. observer evals)

Status

Early-stage and actively developed.

Name		Name	Last commit message	Last commit date
Latest commit History 455 Commits
.agents/skills		.agents/skills
.claude		.claude
.entire		.entire
.github		.github
.nwave		.nwave
.serena		.serena
app		app
cli		cli
docs		docs
evals		evals
schema		schema
scripts		scripts
tests		tests
vendor		vendor
.env.example		.env.example
.env.ollama.example		.env.ollama.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLA.md		CLA.md
CLAUDE.md		CLAUDE.md
COMPETITIVE_ANALYSIS.md		COMPETITIVE_ANALYSIS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
bun.lock		bun.lock
bunfig.client.toml		bunfig.client.toml
bunfig.toml		bunfig.toml
components.json		components.json
conductor.json		conductor.json
docker-compose.yml		docker-compose.yml
evalite.config.ts		evalite.config.ts
index.html		index.html
package.json		package.json
skills-lock.json		skills-lock.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Osabio

The Problem

Architecture

Specialized Agents

How Coordination Works

Key Concepts

Reliability: Solving the Three Drifts

Verifiable Autonomy

Open Source

Tech Stack

Connect in 60 Seconds

Quickstart

1) Prerequisites

2) Install dependencies

3) Start SurrealDB

4) Configure environment

OpenRouter profile

Ollama profile

5) Apply migrations

6) Run the app

API at a Glance

MCP + CLI

Useful Scripts

Repository Map

Status

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Osabio

The Problem

Architecture

Specialized Agents

How Coordination Works

Key Concepts

Reliability: Solving the Three Drifts

Verifiable Autonomy

Open Source

Tech Stack

Connect in 60 Seconds

Quickstart

1) Prerequisites

2) Install dependencies

3) Start SurrealDB

4) Configure environment

OpenRouter profile

Ollama profile

5) Apply migrations

6) Run the app

API at a Glance

MCP + CLI

Useful Scripts

Repository Map

Status

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages