This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ArcticDB is a high-performance, serverless DataFrame database for the Python Data Science ecosystem. It provides a Python API backed by a C++ data-processing and compression engine, supporting S3, LMDB, Azure Blob Storage, and MongoDB backends.
Technical documentation in docs/claude/ is owned and maintained by Claude. Consult these documents when working on related areas.
- Read the relevant doc when starting work in an area (e.g., read
CACHING.mdbefore modifying version map cache) - Update the doc only when making changes to that area
- Do NOT proactively read or update docs for unrelated areas
Keep documentation high-level and terse:
- Reference
file_path:ClassName:method_nameinstead of copying code - Use tables and bullet points over code blocks
- Keep conceptual diagrams; remove implementation details
- Avoid duplicating what's already in source code
| Area | Document |
|---|---|
| Architecture | docs/claude/ARCHITECTURE.md |
| C++ modules | docs/claude/cpp/ (CACHING, VERSIONING, STORAGE_BACKENDS, ENTITY, CODEC, COLUMN_STORE, PIPELINE, PROCESSING, STREAM, ASYNC, PYTHON_BINDINGS) |
| Python modules | docs/claude/python/ (ARCTIC_CLASS, LIBRARY_API, NATIVE_VERSION_STORE, QUERY_PROCESSING, NORMALIZATION, ADAPTERS, TOOLBOX) |
Check CLAUDE_USER_SETTINGS.md (git-ignored) for user-specific configuration:
- Python virtual environment paths (Claude should use its own venvs, not user's)
- Preferred CMake presets for debug/release/profiling builds
The vcpkg-based build requires certain system packages that may not be installed by default:
sudo apt install pkg-config flex bison libsasl2-dev -yInitialize git submodules (required for vcpkg):
git submodule update --init --recursiveCopy Makefile.local.example to Makefile.local for Man-specific settings (proxy, TMPDIR, protobuf version).
If Makefile.local is missing, prompt the user to create it, using Makefile.local.example as an example.
If VIRTUAL_ENV is not set:
- Ask for the NAME they want to use for the venv
- If it already exists in
~/venvs/<NAME>inform the user. They can either use it as is, or you can runmake setup NAME=<name> CLEAN=1to recreate it. - Otherwise if it does not already exist, create it,
make setup NAME=<name>.
Do not warn the user that it will take a while - it's usually fast.
The venv must be activated before running any make target or command that uses python (protoc, lint, lint-check, test-py, bench-py, wheel). Prefix every such command with activation:
source $(make activate NAME=<name>) && make test-pyA root Makefile provides shortcuts for common tasks. User-specific overrides (presets, proxy, TMPDIR) go in Makefile.local (gitignored; see Makefile.local.example).
| Target | Description | Key variables |
|---|---|---|
make help |
List all targets and current variable values | |
make setup NAME=x |
Full setup from scratch: submodules, venv, protoc, build, symlink | CLEAN=1 to replace existing venv |
make protoc |
Generate protobuf stubs | PROTOC_VERS, PROXY_CMD |
make venv NAME=x |
Create a dev venv with all deps (CLEAN=1 to replace existing) |
VENV_DIR, PROXY_CMD |
make activate NAME=x |
Print activate path. Use: source $(make activate NAME=x) |
VENV_DIR |
make lint |
Run formatters in-place | |
make lint-check |
Check formatting without changes | |
make build / build-debug |
Configure, build, and symlink arcticdb_ext |
RELEASE_PRESET / DEBUG_PRESET, CMAKE_JOBS |
make configure / configure-debug |
CMake configure only | |
make test-cpp / test-cpp-debug |
Build and run C++ unit tests | FILTER= for gtest_filter |
make symlink / symlink-debug |
Symlink built extension into python/ |
|
make test-py |
Run Python tests | TYPE=unit|integration|..., FILE= path to file/test, ARGS= |
make build-and-test-py |
Release build + symlink + Python tests | RELEASE_PRESET, CMAKE_JOBS, TYPE=, FILE=, ARGS= |
make build-and-test-py-debug |
Debug build + symlink + Python tests | DEBUG_PRESET, CMAKE_JOBS, TYPE=, FILE=, ARGS= |
make wheel |
Build a pip wheel into dist/ |
|
make bench-cpp |
Build and run C++ benchmarks | FILTER= |
make install-editable |
Install arcticdb in editable mode (no C++ rebuild) | |
make bench-py |
Run ASV Python benchmarks (runs install-editable first) |
BENCH= |
Key presets in cpp/CMakePresets.json:
linux-debug/linux-release- Linux with vcpkglinux-conda-debug/linux-conda-release- Linux with conda-forge deps (setARCTICDB_USING_CONDA=1)windows-cl-debug/windows-cl-release- Windows with MSVCmacos-debug/macos-release- macOS
User-specific presets can be defined in cpp/CMakeUserPresets.json (git-ignored).
The project uses several git submodules. Do not directly edit files inside submodule directories - instead update the submodule reference.
| Submodule | Path | Purpose |
|---|---|---|
| vcpkg | cpp/vcpkg |
Package manager with custom ports (e.g., arcticdb-sparrow) |
| pybind11 | cpp/third_party/pybind11 |
Python bindings |
| lmdb | cpp/third_party/lmdb |
LMDB storage backend |
| lmdbxx | cpp/third_party/lmdbxx |
C++ wrapper for LMDB |
| recycle | cpp/third_party/recycle |
Memory recycling |
| rapidcheck | cpp/third_party/rapidcheck |
Property-based testing |
| entt | cpp/third_party/entt |
Entity component system |
When upgrading a dependency like sparrow that has a custom port in vcpkg:
-
Fetch and checkout the vcpkg commit containing the new version:
cd cpp/vcpkg git fetch origin git log --oneline origin/master | grep -i <package-name> # Find the commit git checkout <commit-hash> cd ../..
-
Update the version override in
cpp/vcpkg.json:"overrides": [ { "name": "arcticdb-sparrow", "version": "X.Y.Z" } ]
-
Update conda environment in
environment-dev.ymlif applicable -
Rebuild - vcpkg will fetch the new version on next build
C++ benchmark sources are in cpp/arcticdb/*/test/benchmark_*.cpp. ASV Python benchmarks live in python/benchmarks/. See ASV Benchmarks Wiki.
When writing or modifying code, follow the standards in docs/claude/PR_REVIEW_GUIDELINES.md. These cover API stability, memory safety, on-disk format compatibility, concurrency, testing, and other quality gates enforced during PR review.
- Data written by newer clients should be readable by older clients - document breaking changes clearly
- API changes affecting V1 or V2 public APIs must be highlighted in PR descriptions
Code style is enforced by make lint Always run make lint after making code changes.
- Do not add "Generated with AI" or "Co-Authored-By" lines to commit messages