Architecture Overview

Version: 1.0 Last Updated: 2026-01-31 Category: Feature Guide

In Plain English

Babysitter is built in layers, like a well-organized office.

Think of it like this:

The Plugin is the receptionist - it takes your requests and routes them to the right department
The SDK is the operations team - it actually does the work
The Journal is the filing cabinet - it keeps a record of everything
The AskUserQuestion Tool is the approval desk - it pauses for human review when needed

Tip for beginners: You don't need to understand the architecture to use Babysitter. This document is for those who want to understand how it works under the hood, or who are building custom processes.

Related: For the conceptual model of how orchestration and AI work together, see Two-Loops Architecture.

Overview

Babysitter uses a modular architecture designed for reliability, debuggability, and extensibility. The system combines a deterministic orchestration engine with adaptive AI capabilities, all backed by an event-sourced persistence layer.

High-Level Architecture

+-----------------------------------------------------------------+
|  Claude Code Session                                             |
|  +-----------------------------------------------------------+  |
|  |  Babysitter Skill (orchestrates via CLI)                  |  |
|  +-----------------------------------------------------------+  |
|                           |                                      |
|                           v                                      |
|  +-----------------------------------------------------------+  |
|  |  .a5c/runs/<runId>/                                       |  |
|  |  +-- run.json        (run metadata)                       |  |
|  |  +-- inputs.json     (run inputs)                         |  |
|  |  +-- code/           (process code)                       |  |
|  |  +-- artifacts/      (output artifacts)                   |  |
|  |  +-- journal/        (event log, individual JSON files)   |  |
|  |  +-- state/state.json (current state)                     |  |
|  |  +-- tasks/<effectId>/ (task artifacts)                   |  |
|  +-----------------------------------------------------------+  |
|                           |                                      |
|                           v                                      |
|  +-----------------------------------------------------------+  |
|  |  AskUserQuestion Tool (human approval)                     |  |
|  +-----------------------------------------------------------+  |
+-----------------------------------------------------------------+

Core Components

1. Babysitter Skill Plugin

Location: plugins/babysitter/skills/babysit/

Responsibilities:

Parses natural language commands into process inputs
Orchestrates the run loop via SDK CLI
Manages iteration lifecycle
Handles resumption from saved state
Reports progress to Claude Code

Technology: Claude Code Plugin System (JavaScript)

2. Babysitter SDK

Package: @a5c-ai/babysitter-sdk

Core Modules:

Module	Purpose	Key Functions
Process Engine	Executes process definitions	`runProcess()`, `iterate()`
Journal Manager	Event-sourced persistence	`append()`, `replay()`, `getState()`
Task Executor	Runs tasks (agent, skill, node)	`executeTask()`, `parallel.all()`
State Manager	Maintains run state cache	`saveState()`, `loadState()`
Hook System	Extensibility points	`registerHook()`, `trigger()`

Technology: Node.js, TypeScript

3. Event-Sourced Journal

Format: Individual JSON files in journal/ directory, one per event, named {SEQ}.{ULID}.json (e.g. 000001.01ARZ3NDEKTSV4RRFFQ69G5FAV.json)

Event Types:

type JournalEvent =
  | { type: 'RUN_CREATED', recordedAt: string, data: { runId: string, inputs: any }, checksum: string }
  | { type: 'EFFECT_REQUESTED', recordedAt: string, data: { effectId: string, kind: string, args: any }, checksum: string }
  | { type: 'EFFECT_RESOLVED', recordedAt: string, data: { effectId: string, result: any }, checksum: string }
  | { type: 'RUN_COMPLETED', recordedAt: string, data: { status: string }, checksum: string }
  | { type: 'RUN_FAILED', recordedAt: string, data: { error: string }, checksum: string }

// Note: seq is derived from the filename, not stored in the event body.
// Breakpoints use EFFECT_REQUESTED with kind: 'breakpoint' and EFFECT_RESOLVED.

Benefits:

Deterministic replay: Reconstruct exact state at any point
Audit trail: Complete history of all actions
Debugging: Trace execution flow and identify issues
Resumability: Continue from last event after interruption

Implementation:

// Write individual JSON file per event
function appendEvent(event, seq) {
  const filename = `${String(seq).padStart(6, '0')}.${ulid()}.json`;
  fs.writeFileSync(path.join(journalDir, filename), JSON.stringify(event, null, 2));
}

// Replay by reading all JSON files from journal/ directory
function replayJournal() {
  const files = fs.readdirSync(journalDir)
    .filter(f => f.endsWith('.json'))
    .sort(); // lexicographic sort preserves sequence order

  const events = files.map(f =>
    JSON.parse(fs.readFileSync(path.join(journalDir, f), 'utf-8'))
  );

  return events.reduce(applyEvent, initialState);
}

For more details on the journal system, see Journal System.

4. Process Definitions

Format: JavaScript/TypeScript functions

Execution Model:

+----------------------------------------------------------+
| Process Definition (JavaScript)                          |
|                                                          |
|  export async function process(inputs, ctx) {           |
|    // User-defined orchestration logic                  |
|    const result = await ctx.task(someTask, args);       |
|    await ctx.breakpoint({ question: '...' });           |
|    return result;                                        |
|  }                                                       |
+----------------------------------------------------------+
                          |
                          v
+----------------------------------------------------------+
| Context API (ctx)                                        |
|                                                          |
|  - ctx.task(task, args, opts)       Execute task        |
|  - ctx.breakpoint(opts)             Wait for approval   |
|    Returns BreakpointResult: { approved, feedback, ... }|
|  - ctx.parallel.all([...])          Run in parallel     |
|  - ctx.hook(name, data)             Trigger hooks       |
|  - ctx.log(msg, data)               Log to journal      |
|  - ctx.getState(key)                Access state        |
|  - ctx.setState(key, value)         Update state        |
+----------------------------------------------------------+

Process Lifecycle:

Load: Process definition loaded from file or default
Initialize: Context created with state and journal access
Execute: Process function called with inputs and context
Iterate: Process may loop internally or be called multiple times
Complete: Process returns final result

For more details on creating processes, see Process Definitions.

5. Task Execution System

Task Types:

Type	Executor	Use Case	Example
Agent	LLM API	Planning, analysis, scoring	GPT-4, Claude
Skill	Claude Code	Code operations	Refactoring, search
Node	Node.js	Scripts and tools	Build, test, deploy
Shell	System shell	Commands	git, npm, docker

Execution Flow:

+---------------------------------------------------------+
| Task Request                                            |
| ctx.task(taskDef, args, opts)                           |
+-----------------+---------------------------------------+
                  |
                  v
+---------------------------------------------------------+
| Task Validation                                         |
| - Validate arguments                                    |
| - Check dependencies                                    |
| - Generate task ID                                      |
+-----------------+---------------------------------------+
                  |
                  v
+---------------------------------------------------------+
| Journal Event: EFFECT_REQUESTED                         |
+-----------------+---------------------------------------+
                  |
                  v
+---------------------------------------------------------+
| Execute Task                                            |
| - Agent: Call LLM API                                   |
| - Skill: Invoke Claude Code skill                       |
| - Node: Run JavaScript function                         |
| - Shell: Execute command                                |
| - Breakpoint: Wait for approval (kind: breakpoint)      |
+-----------------+---------------------------------------+
                  |
                  v
+---------------------------------------------------------+
| Journal Event: EFFECT_RESOLVED                          |
+-----------------+---------------------------------------+
                  |
                  v
+---------------------------------------------------------+
| Return Result                                           |
| - Success: Return task output                           |
| - Failure: Throw error or return error object           |
+---------------------------------------------------------+

Parallel Execution:

// Tasks run concurrently with Promise.all
await ctx.parallel.all([
  () => ctx.task(task1, args1),
  () => ctx.task(task2, args2),
  () => ctx.task(task3, args3)
]);

// All results returned when all complete
// If any fails, entire parallel group fails

For more details on parallel execution, see Parallel Execution.

Data Flow

Complete Request Flow:

1. User Command
   |
   +--> Claude Code
        |
        +--> Babysitter Skill
             |
             +-- Parse intent
             +-- Load/create run
             +--> CLI: npx -y @a5c-ai/babysitter-sdk@latest run:iterate
                  |
                  +--> SDK Process Engine
                       |
                       +-- Load process definition
                       +-- Replay journal -> restore state
                       +-- Execute process function
                       |    |
                       |    +-- ctx.task() -> Execute tasks
                       |    |    |
                       |    |    +-- Append EFFECT_REQUESTED
                       |    |    +-- Run executor (agent/skill/node/shell)
                       |    |    +-- Append EFFECT_RESOLVED
                       |    |
                       |    +--> ctx.breakpoint() -> Wait for approval
                       |         |
                       |         +-- Append EFFECT_REQUESTED (kind: breakpoint)
                       |         +-- Poll for response
                       |         +-- Append EFFECT_RESOLVED
                       |
                       +-- Append iteration events to journal
                       +-- Save state cache
                       +--> Return results to skill
                            |
                            +--> Report to Claude Code
                                 |
                                 +--> Display to user

State Management

Two-Layer State System:

Journal (source of truth):
- Append-only event log
- Immutable history
- Replayed to reconstruct state
State Cache (performance):
- Snapshot of current state
- Rebuilt from journal if missing
- Fast access without replay

State Structure:

interface RunState {
  runId: string;
  status: 'running' | 'paused' | 'completed' | 'failed';
  iteration: number;
  inputs: any;
  outputs?: any;
  processState: Map<string, any>;  // Process-specific state
  taskResults: Map<string, any>;    // Cached task results
  metrics: {
    startTime: number;
    endTime?: number;
    iterations: number;
    qualityScores: number[];
  };
}

Extensibility

Hook System:

// Register custom hooks
ctx.hook('task:completed', async (taskResult) => {
  await sendMetricsToDatadog(taskResult);
});

ctx.hook('quality:score', async (score) => {
  if (score < 70) {
    await sendAlert('Low quality score');
  }
});

// Built-in hook points
- 'run:started'
- 'run:completed'
- 'iteration:started'
- 'iteration:completed'
- 'task:started'
- 'task:completed'
- 'breakpoint:requested'
- 'breakpoint:resolved'
- 'quality:score'

Custom Task Types:

// Define custom task executor
function registerCustomTask(type, executor) {
  taskExecutors.set(type, executor);
}

// Use custom task
await ctx.task({ type: 'custom', fn: myExecutor }, args);

For more details on hooks, see Hooks.

Technology Stack

Component	Technology	Purpose
Plugin	JavaScript	Claude Code integration
SDK	TypeScript + Node.js	Core orchestration engine
Process Definitions	JavaScript/TypeScript	User workflow logic
Journal	Individual JSON files	Event persistence
CLI	Commander.js	Command-line interface

Design Patterns

Event Sourcing:

All state changes recorded as events
State derived from event replay
Time-travel debugging possible

Command Query Responsibility Segregation (CQRS):

Write: Append events to journal
Read: Query state cache or replay

Saga Pattern:

Long-running workflows with compensation
Breakpoints as decision points
Resumable across sessions

Plugin Architecture:

Extensible via hooks
Custom task types
Process definitions as plugins

Summary

Babysitter's architecture is built on these key principles:

Modular Design: Each component has a clear, single responsibility
Event Sourcing: The journal provides a complete, replayable audit trail
Two-Layer State: Journal for truth, cache for performance
Extensibility: Hooks and custom tasks enable integration with any system
Human-in-the-Loop: Breakpoints enables approval workflows

This architecture enables reliable, debuggable, and auditable AI-powered workflows that can be paused, resumed, and replayed at any point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Overview

In Plain English

Overview

High-Level Architecture

Core Components

1. Babysitter Skill Plugin

2. Babysitter SDK

3. Event-Sourced Journal

4. Process Definitions

5. Task Execution System

Data Flow

State Management

Extensibility

Technology Stack

Design Patterns

Related Documentation

Summary

FilesExpand file tree

architecture-overview.md

Latest commit

History

architecture-overview.md

File metadata and controls

Architecture Overview

In Plain English

Overview

High-Level Architecture

Core Components

1. Babysitter Skill Plugin

2. Babysitter SDK

3. Event-Sourced Journal

4. Process Definitions

5. Task Execution System

Data Flow

State Management

Extensibility

Technology Stack

Design Patterns

Related Documentation

Summary