DSPy.ts brings Stanford's DSPy to TypeScript: you declare signatures (typed input → output), compose them into modules and pipelines, give an optimizer a metric and a handful of examples, and it tunes the prompts and demonstrations for you — no hand-crafted prompt strings. Underneath, everything that needs a vector index, a memory, or a cache runs on agentdb: HNSW search, RaBitQ quantization, ReasoningBank, reflexion. The optimizers remember what worked across runs.
Prompts are code you can't refactor. DSPy.ts makes the LM program the artifact — a signature plus modules — and lets a metric do the tuning. The TypeScript port adds end-to-end types, runs in Node and the browser, and is built AgentDB-first: optimizer trials, ReAct reflexions, and LM responses all persist to a real vector store, so a second
compile()(or a second agent run) starts from what the first one learned. Built byrUv.
init nothing, npm install dspy.ts, and your LM program becomes optimizable:
Signature (typed in→out)
│
▼
Module / Pipeline ──────────────► LM (optionally CachingLM — fuzzy AgentDB cache)
(Predict · ChainOfThought ▲
· ReAct(+reflexion) · Retrieve) │
│ │
▼ │
Optimizer ───► metric + trainset ───► compile() ──► Optimized Module
(BootstrapFewShot · MIPROv2 · GEPA) │
│ │
▼ ▼
AgentDB ◄── vectors · RaBitQ · tiers · ReasoningBank · reflexions · experience replay · trace
(HNSW) ▲ │
└────────────── warm-start next compile ┘
New to DSPy? You don't have to touch AgentDB. Define a
signature, wrap it inChainOfThought, hand ametric+ a few examples toBootstrapFewShot, callcompile(). Everything else (vector store, caching, reflexion) is opt-in.
npm install dspy.tsimport {
ChainOfThought, BootstrapFewShot, MIPROv2,
configureLM, DummyLM, // swap DummyLM for OpenAI / Anthropic / ONNX / torch
} from 'dspy.ts';
configureLM(new DummyLM()); // or your provider
// 1. Declare a typed signature
const qa = new ChainOfThought<{ question: string }, { answer: string }>({
name: 'QA',
signature: {
inputs: [{ name: 'question', type: 'string', required: true }],
outputs: [{ name: 'answer', type: 'string', required: true }],
},
});
// 2. Optimize it against a metric + a handful of examples
const metric = (_in: { question: string }, out: { answer: string }, gold?: { answer: string }) =>
gold && out.answer?.trim() === gold.answer ? 1 : out.answer ? 0.3 : 0;
const trainset = [
{ input: { question: 'capital of France?' }, output: { answer: 'Paris' } },
{ input: { question: '2 + 2?' }, output: { answer: '4' } },
{ input: { question: 'largest planet?' } }, // unlabeled → bootstrapped demo
];
const compiled = await new BootstrapFewShot(metric).compile(qa, trainset);
const { answer } = await compiled.run({ question: 'capital of Italy?' });import { AgentDBClient, RetrieveModule } from 'dspy.ts';
const store = new AgentDBClient({ vectorDimension: 384, storage: { inMemory: true } });
await store.init();
await store.storeText('Paris is the capital of France.');
await store.storeText('Rome is the capital of Italy.');
const retrieve = new RetrieveModule({ client: store, k: 3, useMMR: true }); // MMR-diversified
const { passages, context } = await retrieve.run({ query: 'what is the capital of Italy?' });
// feed `context` into a ChainOfThought("question, context -> answer")import { MIPROv2, CompilationTracer } from 'dspy.ts';
const replay = new AgentDBClient({ vectorDimension: 64, storage: { inMemory: true } });
await replay.init();
const tracer = new CompilationTracer({ store: replay }); // causal-chain observability
const opt = new MIPROv2(metric, { numTrials: 12, replayStore: replay, tracer });
await opt.compile(qa, trainset);
// a later compile of the same task fingerprint warm-starts from the prior best instruction:
// opt2.result.warmStarted === true| Capability | Why it matters | |
|---|---|---|
| 🧩 | Composable, typed modules | Predict, ChainOfThought, ReAct, Retrieve, Pipeline — declare a signature once, get end-to-end TypeScript types on inputs and outputs. |
| 🎯 | Self-optimizing | BootstrapFewShot, MIPROv2, GEPA — hand them a metric + examples; they tune instructions and demonstrations and hand back an OptimizedModule. Deterministic per seed; save()/load(). |
| 🧬 | GEPA prompt evolution | Genetic-Pareto: a per-example-scored Pareto frontier of prompt candidates, reflect-on-weakest → mutate → admit-if-non-dominated. The frontier persists to AgentDB and re-runs warm-start. |
| ♻️ | Experience replay | MIPROv2 persists each compile's winning instruction (keyed by a task fingerprint); a later compile() of a similar task starts from what worked — stock DSPy starts cold every time. |
| 🔎 | Input-conditioned few-shot | BootstrapFewShot can pick, per input at run time, the demos nearest the current input (vector search) instead of a fixed set. |
| 📚 | RAG, built-in | RetrieveModule over the AgentDB vector store with Maximal-Marginal-Relevance diversity re-rank → declarative Retrieve → ChainOfThought pipelines. |
| 🪞 | ReAct reflexion | ReActReflexion over AgentDB: recall lessons from failed past attempts before acting, record episodes after, and promote a tool sub-strategy to a skill once it succeeds repeatedly. |
| 💾 | AgentDB memory | AgentDBClient — HNSW vector search, RaBitQ 1-bit quantization (~32× smaller), hierarchical tiers (working/short/long), ReasoningBank with semantic retrieval. Real agentdb classes; pure-JS fallback when native deps are absent. |
| ⚡ | Fuzzy LM cache | CachingLM wraps any LMDriver and serves generate() from an AgentDB vector cache — a near-identical prompt is a hit (cosine ≥ threshold, options + TTL aware). |
| 📈 | Observability | CompilationTracer records optimizer runs + trials as a causal chain (causedBy per trial), persisted to AgentDB; optional @mlflow/tracking logging when that dep is present. |
| 🔌 | Multi-provider LM | OpenAI, Anthropic, local ONNX, js-pytorch — configureLM(driver), or compose CachingLM around any of them. |
Modules
| Module | What it does |
|---|---|
PredictModule |
Single-step typed prediction — format a prompt from the signature, call the LM, parse the structured output. |
ChainOfThought |
Step-by-step reasoning before the answer. |
ReAct |
Reasoning + Acting: alternates thoughts and tool calls until a final answer. Optional reflexion config (see ReActReflexion). |
ReActReflexion |
Episodic memory for ReAct — recall(taskKey) lessons + skill plans, recordEpisode(...), promote repeated successful tool sequences to skills. AgentDB-backed. |
RetrieveModule |
Vector retrieval over an AgentDBClient with MMR diversity; returns { passages, context } for downstream modules. |
Pipeline |
Compose modules into a multi-step program with per-step timing and error capture. |
Optimizers
| Optimizer | What it does |
|---|---|
BootstrapFewShot |
Labeled demos from the trainset + bootstrapped demos (run the program, keep what scores ≥ minScore). Optional dynamicDemos: { store, k } for input-conditioned demo selection at run time. |
MIPROv2 |
Instruction proposal (LM + template set + the program's own prompt) → demo bootstrapping → seeded search over (instruction × demo-subset) scored on a trainset minibatch → best OptimizedModule. result exposes the full trial trace. Optional replayStore (cross-run warm-start) and tracer. |
GEPA |
Genetic-Pareto reflective prompt evolution — per-example Pareto frontier, reflect on a candidate's weakest examples → mutate → evaluate → admit if non-dominated → prune. result exposes best / frontier / reflections. Optional frontierStore (persist + warm-start). |
Optimizer (base) |
compile(program, trainset) + save() / load() — extend it for your own search. |
AgentDB memory layer
AgentDBClient is the substrate the rest of DSPy.ts builds on:
- Vector store —
store/storeText/search/searchText/update/delete/batchStore/getStats, withk/minScore/includeVectors, a search cache, and stats. Backed byagentdb'sEnhancedEmbeddingService(transformers.js / ONNX) for text → vector; falls back to a deterministichashEmbedwhen native deps aren't available. - RaBitQ quantization —
performance.quantization: 'rabitq'stores a 1-bit-per-dimension sign code (~32× smaller than float32); search becomes a Hamming coarse-filter → cosine re-rank of the toprerankFactor × k.quantizationInfo()reports the mode and compression ratio. - Hierarchical tiers —
store(v, m, { tier })(working|short|long, defaultlong),searchTiered({ tiers }),promote(id, tier),evictTier(tier, { maxAgeMs, max }),tierCounts(). ReasoningBank+SAFLA— knowledge units learned from experiences, embedded via the AgentDB EmbeddingService;retrieveSemantic(queryText, …)does real vector search over stored units; SAFLA prunes/evolves the knowledge base.
Some integrations (
agentdb.ReflexionMemory/SkillLibrary/CausalMemoryGraph) are currently layered on theAgentDBClientvector store with the same behaviour — delegating to those classes directly needs a sql.jsdbhandle thatAgentDBClientdoesn't yet expose. Tracked on the issues below.
Examples
examples/ includes runnable demos — classification, chain-of-thought, ReAct (incl. a tool agent), fine-tuning, optimization, sentiment, MIPROv2, GEPA (examples/gepa/), and AgentDB-backed retrieval (examples/retrieve/). Run with npx ts-node examples/<dir>/index.ts.
| Doc | Where |
|---|---|
| API reference | npm run docs → docs/api/ (TypeDoc) — also built in CI. |
| Examples | examples/ |
| Migration notes | MIGRATION.md · IMPLEMENTATION_SUMMARY.md |
| Changelog | CHANGELOG.md |
| Issues / roadmap | github.com/ruvnet/dspy.ts/issues |
DSPy.ts is moving fast; current honest gaps (tracked on the issues):
- ONNX
generate()returns a shape summary, not full text — needs a real tokenizer + autoregressive decode loop. agentdb.HNSWIndexis pattern-table-backed (not a generic KNN store), soAgentDBClient's search is currently a JS cosine scan — wiringagentdb.WASMVectorSearch(or a ReasoningBank-style table) would speed it up.- Delegate ReAct reflexion / skills / compilation traces to
agentdb.ReflexionMemory/SkillLibrary/CausalMemoryGraphdirectly onceAgentDBClientexposes adbhandle. - Consolidate the two LM registries (
src/lm/basevs the package root). - A Bayesian surrogate over MIPROv2's search grid; a surrogate-guided / true multi-objective Pareto sampler for GEPA.
- Broader test coverage (
agent/swarm,lm/providers,modules/chain-of-thoughtare still light); the CI coverage gate is informational for now. npm run lintneeds the ESLint-9 flat-config migration.
MIT — rUv / ruvnet