Skip to content

ruvnet/dspy.ts

Repository files navigation

npm version npm downloads CI TypeScript License: MIT Star on GitHub 🕸️ AgentDB

DSPy.ts

Program AI systems, don't prompt them — in TypeScript, on AgentDB.

DSPy.ts brings Stanford's DSPy to TypeScript: you declare signatures (typed input → output), compose them into modules and pipelines, give an optimizer a metric and a handful of examples, and it tunes the prompts and demonstrations for you — no hand-crafted prompt strings. Underneath, everything that needs a vector index, a memory, or a cache runs on agentdb: HNSW search, RaBitQ quantization, ReasoningBank, reflexion. The optimizers remember what worked across runs.

Why DSPy.ts?

Prompts are code you can't refactor. DSPy.ts makes the LM program the artifact — a signature plus modules — and lets a metric do the tuning. The TypeScript port adds end-to-end types, runs in Node and the browser, and is built AgentDB-first: optimizer trials, ReAct reflexions, and LM responses all persist to a real vector store, so a second compile() (or a second agent run) starts from what the first one learned. Built by rUv.

What DSPy.ts does

init nothing, npm install dspy.ts, and your LM program becomes optimizable:

Signature (typed in→out)
        │
        ▼
   Module / Pipeline ──────────────►  LM  (optionally CachingLM — fuzzy AgentDB cache)
   (Predict · ChainOfThought                       ▲
    · ReAct(+reflexion) · Retrieve)                │
        │                                          │
        ▼                                          │
   Optimizer  ───►  metric + trainset  ───►  compile()  ──►  Optimized Module
   (BootstrapFewShot · MIPROv2 · GEPA)               │
        │                                            │
        ▼                                            ▼
   AgentDB  ◄── vectors · RaBitQ · tiers · ReasoningBank · reflexions · experience replay · trace
   (HNSW)        ▲                                   │
                 └────────────── warm-start next compile ┘

New to DSPy? You don't have to touch AgentDB. Define a signature, wrap it in ChainOfThought, hand a metric + a few examples to BootstrapFewShot, call compile(). Everything else (vector store, caching, reflexion) is opt-in.


Quick Start

npm install dspy.ts
import {
  ChainOfThought, BootstrapFewShot, MIPROv2,
  configureLM, DummyLM,           // swap DummyLM for OpenAI / Anthropic / ONNX / torch
} from 'dspy.ts';

configureLM(new DummyLM());       // or your provider

// 1. Declare a typed signature
const qa = new ChainOfThought<{ question: string }, { answer: string }>({
  name: 'QA',
  signature: {
    inputs:  [{ name: 'question', type: 'string', required: true }],
    outputs: [{ name: 'answer',   type: 'string', required: true }],
  },
});

// 2. Optimize it against a metric + a handful of examples
const metric = (_in: { question: string }, out: { answer: string }, gold?: { answer: string }) =>
  gold && out.answer?.trim() === gold.answer ? 1 : out.answer ? 0.3 : 0;

const trainset = [
  { input: { question: 'capital of France?' }, output: { answer: 'Paris' } },
  { input: { question: '2 + 2?' },             output: { answer: '4' } },
  { input: { question: 'largest planet?' } },              // unlabeled → bootstrapped demo
];

const compiled = await new BootstrapFewShot(metric).compile(qa, trainset);
const { answer } = await compiled.run({ question: 'capital of Italy?' });

RAG in three lines — Retrieve → ChainOfThought

import { AgentDBClient, RetrieveModule } from 'dspy.ts';

const store = new AgentDBClient({ vectorDimension: 384, storage: { inMemory: true } });
await store.init();
await store.storeText('Paris is the capital of France.');
await store.storeText('Rome is the capital of Italy.');

const retrieve = new RetrieveModule({ client: store, k: 3, useMMR: true });   // MMR-diversified
const { passages, context } = await retrieve.run({ query: 'what is the capital of Italy?' });
// feed `context` into a ChainOfThought("question, context -> answer")

Cross-run learning — MIPROv2 with experience replay

import { MIPROv2, CompilationTracer } from 'dspy.ts';

const replay = new AgentDBClient({ vectorDimension: 64, storage: { inMemory: true } });
await replay.init();
const tracer = new CompilationTracer({ store: replay });   // causal-chain observability

const opt = new MIPROv2(metric, { numTrials: 12, replayStore: replay, tracer });
await opt.compile(qa, trainset);
// a later compile of the same task fingerprint warm-starts from the prior best instruction:
//   opt2.result.warmStarted === true

What you get

Capability Why it matters
🧩 Composable, typed modules Predict, ChainOfThought, ReAct, Retrieve, Pipeline — declare a signature once, get end-to-end TypeScript types on inputs and outputs.
🎯 Self-optimizing BootstrapFewShot, MIPROv2, GEPA — hand them a metric + examples; they tune instructions and demonstrations and hand back an OptimizedModule. Deterministic per seed; save()/load().
🧬 GEPA prompt evolution Genetic-Pareto: a per-example-scored Pareto frontier of prompt candidates, reflect-on-weakest → mutate → admit-if-non-dominated. The frontier persists to AgentDB and re-runs warm-start.
♻️ Experience replay MIPROv2 persists each compile's winning instruction (keyed by a task fingerprint); a later compile() of a similar task starts from what worked — stock DSPy starts cold every time.
🔎 Input-conditioned few-shot BootstrapFewShot can pick, per input at run time, the demos nearest the current input (vector search) instead of a fixed set.
📚 RAG, built-in RetrieveModule over the AgentDB vector store with Maximal-Marginal-Relevance diversity re-rank → declarative Retrieve → ChainOfThought pipelines.
🪞 ReAct reflexion ReActReflexion over AgentDB: recall lessons from failed past attempts before acting, record episodes after, and promote a tool sub-strategy to a skill once it succeeds repeatedly.
💾 AgentDB memory AgentDBClient — HNSW vector search, RaBitQ 1-bit quantization (~32× smaller), hierarchical tiers (working/short/long), ReasoningBank with semantic retrieval. Real agentdb classes; pure-JS fallback when native deps are absent.
Fuzzy LM cache CachingLM wraps any LMDriver and serves generate() from an AgentDB vector cache — a near-identical prompt is a hit (cosine ≥ threshold, options + TTL aware).
📈 Observability CompilationTracer records optimizer runs + trials as a causal chain (causedBy per trial), persisted to AgentDB; optional @mlflow/tracking logging when that dep is present.
🔌 Multi-provider LM OpenAI, Anthropic, local ONNX, js-pytorch — configureLM(driver), or compose CachingLM around any of them.
Modules
Module What it does
PredictModule Single-step typed prediction — format a prompt from the signature, call the LM, parse the structured output.
ChainOfThought Step-by-step reasoning before the answer.
ReAct Reasoning + Acting: alternates thoughts and tool calls until a final answer. Optional reflexion config (see ReActReflexion).
ReActReflexion Episodic memory for ReActrecall(taskKey) lessons + skill plans, recordEpisode(...), promote repeated successful tool sequences to skills. AgentDB-backed.
RetrieveModule Vector retrieval over an AgentDBClient with MMR diversity; returns { passages, context } for downstream modules.
Pipeline Compose modules into a multi-step program with per-step timing and error capture.
Optimizers
Optimizer What it does
BootstrapFewShot Labeled demos from the trainset + bootstrapped demos (run the program, keep what scores ≥ minScore). Optional dynamicDemos: { store, k } for input-conditioned demo selection at run time.
MIPROv2 Instruction proposal (LM + template set + the program's own prompt) → demo bootstrapping → seeded search over (instruction × demo-subset) scored on a trainset minibatch → best OptimizedModule. result exposes the full trial trace. Optional replayStore (cross-run warm-start) and tracer.
GEPA Genetic-Pareto reflective prompt evolution — per-example Pareto frontier, reflect on a candidate's weakest examples → mutate → evaluate → admit if non-dominated → prune. result exposes best / frontier / reflections. Optional frontierStore (persist + warm-start).
Optimizer (base) compile(program, trainset) + save() / load() — extend it for your own search.
AgentDB memory layer

AgentDBClient is the substrate the rest of DSPy.ts builds on:

  • Vector storestore / storeText / search / searchText / update / delete / batchStore / getStats, with k / minScore / includeVectors, a search cache, and stats. Backed by agentdb's EnhancedEmbeddingService (transformers.js / ONNX) for text → vector; falls back to a deterministic hashEmbed when native deps aren't available.
  • RaBitQ quantizationperformance.quantization: 'rabitq' stores a 1-bit-per-dimension sign code (~32× smaller than float32); search becomes a Hamming coarse-filter → cosine re-rank of the top rerankFactor × k. quantizationInfo() reports the mode and compression ratio.
  • Hierarchical tiersstore(v, m, { tier }) (working | short | long, default long), searchTiered({ tiers }), promote(id, tier), evictTier(tier, { maxAgeMs, max }), tierCounts().
  • ReasoningBank + SAFLA — knowledge units learned from experiences, embedded via the AgentDB EmbeddingService; retrieveSemantic(queryText, …) does real vector search over stored units; SAFLA prunes/evolves the knowledge base.

Some integrations (agentdb.ReflexionMemory / SkillLibrary / CausalMemoryGraph) are currently layered on the AgentDBClient vector store with the same behaviour — delegating to those classes directly needs a sql.js db handle that AgentDBClient doesn't yet expose. Tracked on the issues below.

Examples

examples/ includes runnable demos — classification, chain-of-thought, ReAct (incl. a tool agent), fine-tuning, optimization, sentiment, MIPROv2, GEPA (examples/gepa/), and AgentDB-backed retrieval (examples/retrieve/). Run with npx ts-node examples/<dir>/index.ts.


Documentation

Doc Where
API reference npm run docsdocs/api/ (TypeDoc) — also built in CI.
Examples examples/
Migration notes MIGRATION.md · IMPLEMENTATION_SUMMARY.md
Changelog CHANGELOG.md
Issues / roadmap github.com/ruvnet/dspy.ts/issues

Roadmap & known limitations

DSPy.ts is moving fast; current honest gaps (tracked on the issues):

  • ONNX generate() returns a shape summary, not full text — needs a real tokenizer + autoregressive decode loop.
  • agentdb.HNSWIndex is pattern-table-backed (not a generic KNN store), so AgentDBClient's search is currently a JS cosine scan — wiring agentdb.WASMVectorSearch (or a ReasoningBank-style table) would speed it up.
  • Delegate ReAct reflexion / skills / compilation traces to agentdb.ReflexionMemory / SkillLibrary / CausalMemoryGraph directly once AgentDBClient exposes a db handle.
  • Consolidate the two LM registries (src/lm/base vs the package root).
  • A Bayesian surrogate over MIPROv2's search grid; a surrogate-guided / true multi-objective Pareto sampler for GEPA.
  • Broader test coverage (agent/swarm, lm/providers, modules/chain-of-thought are still light); the CI coverage gate is informational for now.
  • npm run lint needs the ESLint-9 flat-config migration.

License

MIT — rUv / ruvnet

About

DS.js (Declarative Self‑learning JavaScript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages