This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Create-llama is a monorepo containing CLI tools and server frameworks for building LlamaIndex-powered applications. The repository combines TypeScript/Node.js and Python components in a unified development environment.
packages/create-llama/: Main CLI tool for scaffolding LlamaIndex applicationspython/llama-index-server/: Python/FastAPI server framework- Root: Workspace configuration and shared development tools
- Package Manager: pnpm with workspace configuration
- Build Tools: bunchee (TypeScript), Next.js, hatchling (Python)
- Testing: Playwright for e2e, pytest for Python
- Version Management: changesets for TypeScript packages, manual for Python
pnpm dev # Start all packages in development mode
pnpm build # Build all packages
pnpm lint # ESLint across TypeScript packages
pnpm format # Prettier formatting
pnpm e2e # Run end-to-end testscd packages/create-llama
npm run build # Build CLI using bash script and ncc
npm run dev # Watch mode development
npm run e2e # Playwright tests for generated projects
npm run clean # Clean build artifacts and template cachescd python/llama-index-server
uv run generate # Index data files
fastapi dev # Start development server with hot reload
pytest # Run test suiteThe CLI uses a sophisticated template system in packages/create-llama/templates/:
types/: Base project structures (streaming, reflex, llamaindexserver)components/: Reusable components across frameworksengines/- Chat and agent enginesloaders/- File, web, database loadersproviders/- AI model configurationsvectordbs/- Vector database integrationsuse-cases/- Workflow implementations
- Templates support multiple frameworks (Next.js, Express, FastAPI)
- Component system allows mix-and-match functionality
- E2E tests validate generated projects work correctly
- Core:
LlamaIndexServerclass extending FastAPI - Architecture: Workflow factory pattern for stateless request handling
- UI Generation: AI-powered React component generation from Pydantic schemas
- Development: Hot reloading support with dev mode
Both server frameworks use factory patterns:
// TypeScript
const server = new LlamaIndexServer({
workflow: (context) => createWorkflow(context)
});
// Python
def create_workflow(chat_request: ChatRequest) -> Workflow:
return MyWorkflow(chat_request.messages)Structured events for UI communication:
- UIEvent: Custom components with Pydantic/Zod schemas
- ArtifactEvent: Code/documents for Canvas panel
- SourceNodesEvent: Document sources with metadata
- AgentRunEvent: Tool usage and progress tracking
- Both servers auto-mount
data/andoutput/directories - LlamaCloud integration for remote file access
- Static file serving through framework-specific methods
- Playwright tests in
packages/create-llama/e2e/ - Tests both Python and TypeScript generated projects
- Validates CLI generation and application functionality
- Python: pytest with comprehensive API and service tests
- TypeScript: Integrated testing through build process
- TypeScript compilation with bash script
- ncc bundling for standalone executable
- Template validation and caching
- prebuild: Clean directories
- build: bunchee compilation to ESM/CJS
- postbuild: Next.js preparation and static asset generation
- prepare:py-static: Python integration assets
pnpm release # Build all + publish npm packages + Python release- Node.js >=16.14.0
- Python with uv package manager
- pnpm for package management
- Clone repository and run
pnpm install - For CLI development: work in
packages/create-llama/ - For server development: choose TypeScript or Python package
- Use
pnpm devfor concurrent development across packages - Run
pnpm e2eto validate changes with generated projects
- Changes to templates require rebuilding CLI
- E2E tests validate template functionality across frameworks
- Template caching system speeds up repeated builds
- Server package builds static assets for Python integration
- Version synchronization between TypeScript and Python packages
- Shared UI components and styling across implementations
- CLI uses caching for template operations
- Server frameworks support streaming responses
- Background processing for file operations and LlamaCloud integration