DSPy.ts

Program AI systems, don't prompt them — in TypeScript, on AgentDB.

DSPy.ts brings Stanford's DSPy to TypeScript: you declare signatures (typed input → output), compose them into modules and pipelines, give an optimizer a metric and a handful of examples, and it tunes the prompts and demonstrations for you — no hand-crafted prompt strings. Underneath, everything that needs a vector index, a memory, or a cache runs on agentdb: HNSW search, RaBitQ quantization, ReasoningBank, reflexion. The optimizers remember what worked across runs.

Why DSPy.ts?

Prompts are code you can't refactor. DSPy.ts makes the LM program the artifact — a signature plus modules — and lets a metric do the tuning. The TypeScript port adds end-to-end types, runs in Node and the browser, and is built AgentDB-first: optimizer trials, ReAct reflexions, and LM responses all persist to a real vector store, so a second compile() (or a second agent run) starts from what the first one learned. Built by rUv.

What DSPy.ts does

init nothing, npm install dspy.ts, and your LM program becomes optimizable:

Signature (typed in→out)
        │
        ▼
   Module / Pipeline ──────────────►  LM  (optionally CachingLM — fuzzy AgentDB cache)
   (Predict · ChainOfThought                       ▲
    · ReAct(+reflexion) · Retrieve)                │
        │                                          │
        ▼                                          │
   Optimizer  ───►  metric + trainset  ───►  compile()  ──►  Optimized Module
   (BootstrapFewShot · MIPROv2 · GEPA)               │
        │                                            │
        ▼                                            ▼
   AgentDB  ◄── vectors · RaBitQ · tiers · ReasoningBank · reflexions · experience replay · trace
   (HNSW)        ▲                                   │
                 └────────────── warm-start next compile ┘

New to DSPy? You don't have to touch AgentDB. Define a signature, wrap it in ChainOfThought, hand a metric + a few examples to BootstrapFewShot, call compile(). Everything else (vector store, caching, reflexion) is opt-in.

Quick Start

npm install dspy.ts

import {
  ChainOfThought, BootstrapFewShot, MIPROv2,
  configureLM, DummyLM,           // swap DummyLM for OpenAI / Anthropic / ONNX / torch
} from 'dspy.ts';

configureLM(new DummyLM());       // or your provider

// 1. Declare a typed signature
const qa = new ChainOfThought<{ question: string }, { answer: string }>({
  name: 'QA',
  signature: {
    inputs:  [{ name: 'question', type: 'string', required: true }],
    outputs: [{ name: 'answer',   type: 'string', required: true }],
  },
});

// 2. Optimize it against a metric + a handful of examples
const metric = (_in: { question: string }, out: { answer: string }, gold?: { answer: string }) =>
  gold && out.answer?.trim() === gold.answer ? 1 : out.answer ? 0.3 : 0;

const trainset = [
  { input: { question: 'capital of France?' }, output: { answer: 'Paris' } },
  { input: { question: '2 + 2?' },             output: { answer: '4' } },
  { input: { question: 'largest planet?' } },              // unlabeled → bootstrapped demo
];

const compiled = await new BootstrapFewShot(metric).compile(qa, trainset);
const { answer } = await compiled.run({ question: 'capital of Italy?' });

RAG in three lines — `Retrieve → ChainOfThought`

import { AgentDBClient, RetrieveModule } from 'dspy.ts';

const store = new AgentDBClient({ vectorDimension: 384, storage: { inMemory: true } });
await store.init();
await store.storeText('Paris is the capital of France.');
await store.storeText('Rome is the capital of Italy.');

const retrieve = new RetrieveModule({ client: store, k: 3, useMMR: true });   // MMR-diversified
const { passages, context } = await retrieve.run({ query: 'what is the capital of Italy?' });
// feed `context` into a ChainOfThought("question, context -> answer")

Cross-run learning — MIPROv2 with experience replay

import { MIPROv2, CompilationTracer } from 'dspy.ts';

const replay = new AgentDBClient({ vectorDimension: 64, storage: { inMemory: true } });
await replay.init();
const tracer = new CompilationTracer({ store: replay });   // causal-chain observability

const opt = new MIPROv2(metric, { numTrials: 12, replayStore: replay, tracer });
await opt.compile(qa, trainset);
// a later compile of the same task fingerprint warm-starts from the prior best instruction:
//   opt2.result.warmStarted === true

What you get

	Capability	Why it matters
🧩	Composable, typed modules	`Predict`, `ChainOfThought`, `ReAct`, `Retrieve`, `Pipeline` — declare a signature once, get end-to-end TypeScript types on inputs and outputs.
🎯	Self-optimizing	`BootstrapFewShot`, `MIPROv2`, `GEPA` — hand them a metric + examples; they tune instructions and demonstrations and hand back an `OptimizedModule`. Deterministic per `seed`; `save()`/`load()`.
🧬	GEPA prompt evolution	Genetic-Pareto: a per-example-scored Pareto frontier of prompt candidates, reflect-on-weakest → mutate → admit-if-non-dominated. The frontier persists to AgentDB and re-runs warm-start.
♻️	Experience replay	MIPROv2 persists each compile's winning instruction (keyed by a task fingerprint); a later `compile()` of a similar task starts from what worked — stock DSPy starts cold every time.
🔎	Input-conditioned few-shot	`BootstrapFewShot` can pick, per input at run time, the demos nearest the current input (vector search) instead of a fixed set.
📚	RAG, built-in	`RetrieveModule` over the AgentDB vector store with Maximal-Marginal-Relevance diversity re-rank → declarative `Retrieve → ChainOfThought` pipelines.
🪞	ReAct reflexion	`ReActReflexion` over AgentDB: recall lessons from failed past attempts before acting, record episodes after, and promote a tool sub-strategy to a skill once it succeeds repeatedly.
💾	AgentDB memory	`AgentDBClient` — HNSW vector search, RaBitQ 1-bit quantization (~32× smaller), hierarchical tiers (`working`/`short`/`long`), `ReasoningBank` with semantic retrieval. Real `agentdb` classes; pure-JS fallback when native deps are absent.
⚡	Fuzzy LM cache	`CachingLM` wraps any `LMDriver` and serves `generate()` from an AgentDB vector cache — a near-identical prompt is a hit (cosine ≥ threshold, options + TTL aware).
📈	Observability	`CompilationTracer` records optimizer runs + trials as a causal chain (`causedBy` per trial), persisted to AgentDB; optional `@mlflow/tracking` logging when that dep is present.
🔌	Multi-provider LM	OpenAI, Anthropic, local ONNX, js-pytorch — `configureLM(driver)`, or compose `CachingLM` around any of them.

Modules

Module	What it does
`PredictModule`	Single-step typed prediction — format a prompt from the signature, call the LM, parse the structured output.
`ChainOfThought`	Step-by-step reasoning before the answer.
`ReAct`	Reasoning + Acting: alternates thoughts and tool calls until a final answer. Optional `reflexion` config (see `ReActReflexion`).
`ReActReflexion`	Episodic memory for `ReAct` — `recall(taskKey)` lessons + skill plans, `recordEpisode(...)`, promote repeated successful tool sequences to skills. AgentDB-backed.
`RetrieveModule`	Vector retrieval over an `AgentDBClient` with MMR diversity; returns `{ passages, context }` for downstream modules.
`Pipeline`	Compose modules into a multi-step program with per-step timing and error capture.

Optimizers

Optimizer	What it does
`BootstrapFewShot`	Labeled demos from the trainset + bootstrapped demos (run the program, keep what scores ≥ `minScore`). Optional `dynamicDemos: { store, k }` for input-conditioned demo selection at run time.
`MIPROv2`	Instruction proposal (LM + template set + the program's own prompt) → demo bootstrapping → seeded search over (instruction × demo-subset) scored on a trainset minibatch → best `OptimizedModule`. `result` exposes the full trial trace. Optional `replayStore` (cross-run warm-start) and `tracer`.
`GEPA`	Genetic-Pareto reflective prompt evolution — per-example Pareto frontier, reflect on a candidate's weakest examples → mutate → evaluate → admit if non-dominated → prune. `result` exposes `best` / `frontier` / `reflections`. Optional `frontierStore` (persist + warm-start).
`Optimizer` (base)	`compile(program, trainset)` + `save()` / `load()` — extend it for your own search.

AgentDB memory layer

AgentDBClient is the substrate the rest of DSPy.ts builds on:

Vector store — store / storeText / search / searchText / update / delete / batchStore / getStats, with k / minScore / includeVectors, a search cache, and stats. Backed by agentdb's EnhancedEmbeddingService (transformers.js / ONNX) for text → vector; falls back to a deterministic hashEmbed when native deps aren't available.
RaBitQ quantization — performance.quantization: 'rabitq' stores a 1-bit-per-dimension sign code (~32× smaller than float32); search becomes a Hamming coarse-filter → cosine re-rank of the top rerankFactor × k. quantizationInfo() reports the mode and compression ratio.
Hierarchical tiers — store(v, m, { tier }) (working | short | long, default long), searchTiered({ tiers }), promote(id, tier), evictTier(tier, { maxAgeMs, max }), tierCounts().
ReasoningBank + SAFLA — knowledge units learned from experiences, embedded via the AgentDB EmbeddingService; retrieveSemantic(queryText, …) does real vector search over stored units; SAFLA prunes/evolves the knowledge base.

Some integrations (agentdb.ReflexionMemory / SkillLibrary / CausalMemoryGraph) are currently layered on the AgentDBClient vector store with the same behaviour — delegating to those classes directly needs a sql.js db handle that AgentDBClient doesn't yet expose. Tracked on the issues below.

Examples

examples/ includes runnable demos — classification, chain-of-thought, ReAct (incl. a tool agent), fine-tuning, optimization, sentiment, MIPROv2, GEPA (examples/gepa/), and AgentDB-backed retrieval (examples/retrieve/). Run with npx ts-node examples/<dir>/index.ts.

Documentation

Doc	Where
API reference	`npm run docs` → `docs/api/` (TypeDoc) — also built in CI.
Examples	`examples/`
Migration notes	`MIGRATION.md` · `IMPLEMENTATION_SUMMARY.md`
Changelog	`CHANGELOG.md`
Issues / roadmap	github.com/ruvnet/dspy.ts/issues

Roadmap & known limitations

DSPy.ts is moving fast; current honest gaps (tracked on the issues):

ONNX generate() returns a shape summary, not full text — needs a real tokenizer + autoregressive decode loop.
agentdb.HNSWIndex is pattern-table-backed (not a generic KNN store), so AgentDBClient's search is currently a JS cosine scan — wiring agentdb.WASMVectorSearch (or a ReasoningBank-style table) would speed it up.
Delegate ReAct reflexion / skills / compilation traces to agentdb.ReflexionMemory / SkillLibrary / CausalMemoryGraph directly once AgentDBClient exposes a db handle.
Consolidate the two LM registries (src/lm/base vs the package root).
A Bayesian surrogate over MIPROv2's search grid; a surrogate-guided / true multi-objective Pareto sampler for GEPA.
Broader test coverage (agent/swarm, lm/providers, modules/chain-of-thought are still light); the CI coverage gate is informational for now.
npm run lint needs the ESLint-9 flat-config migration.

License

MIT — rUv / ruvnet

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
agents		agents
docs		docs
examples		examples
logs		logs
models		models
plans		plans
research		research
scripts		scripts
src		src
tests		tests
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
README.md		README.md
install.sh		install.sh
jest.config.js		jest.config.js
optimized-sentiment.json		optimized-sentiment.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.prod.json		tsconfig.prod.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSPy.ts

Why DSPy.ts?

What DSPy.ts does

Quick Start

RAG in three lines — `Retrieve → ChainOfThought`

Cross-run learning — MIPROv2 with experience replay

What you get

Documentation

Roadmap & known limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DSPy.ts

Why DSPy.ts?

What DSPy.ts does

Quick Start

RAG in three lines — Retrieve → ChainOfThought

Cross-run learning — MIPROv2 with experience replay

What you get

Documentation

Roadmap & known limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

RAG in three lines — `Retrieve → ChainOfThought`

Packages