This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
dft (datafusion-dft) is a batteries-included suite of DataFusion applications providing four interfaces: TUI, CLI, FlightSQL Server, and HTTP Server. All interfaces share a common execution engine built on Apache DataFusion and Apache Arrow.
# Build the project (default features: functions-parquet, s3)
cargo build
# Build with TUI support
cargo build --features=tui
# Build with all features
cargo build --all-features
# Run the TUI (requires tui feature)
cargo run --features=tui
# Run CLI with a query
cargo run -- -c "SELECT 1 + 2"
# Start HTTP server (requires http feature)
cargo run --features=http -- serve-http
# Start FlightSQL server (requires flightsql feature)
cargo run --features=flightsql -- serve-flightsql
# Generate TPC-H data
cargo run -- generate-tpchBenchmarks measure query performance with detailed timing breakdowns:
# Serial benchmark (default, 10 iterations)
cargo run -- -c "SELECT 1" --bench
# Custom iteration count
cargo run -- -c "SELECT 1" --bench -n 100
# Concurrent benchmark (measures throughput under load)
cargo run -- -c "SELECT 1" --bench --concurrent
# With custom iterations and concurrency
cargo run -- -c "SELECT 1" --bench -n 100 --concurrent
# Save results to CSV
cargo run -- -c "SELECT 1" --bench --save results.csv
# Append to existing results
cargo run -- -c "SELECT 2" --bench --concurrent --save results.csv --append
# Warm up cache before benchmarking
cargo run -- -c "SELECT * FROM t" --bench --run-before "CREATE TABLE t AS VALUES (1)"Benchmark Modes:
-
Serial (default): Measures query performance in isolation
- Shows pure query execution time without contention
- Ideal for understanding baseline performance
-
Concurrent (
--concurrent): Measures performance under load- Runs iterations in parallel (concurrency = min(iterations, CPU cores))
- Shows throughput (queries/second) with multiple clients
- Reveals resource contention and bottlenecks
- Higher mean/median times are expected due to concurrent load
Output:
- Timing breakdown: logical planning, physical planning, execution, total
- Statistics: min, max, mean, median for each phase
- CSV format includes
concurrency_modecolumn (serial or concurrent(N))
FlightSQL Benchmarks:
# Benchmark FlightSQL server (requires --flightsql flag and server running)
cargo run -- -c "SELECT 1" --bench --flightsql --concurrentTests are organized by feature and component:
# Run core database tests
cargo test db
# Run CLI tests
cargo test cli_cases
# Run TUI tests (requires tui feature)
cargo test --features=tui tui_cases
# Run feature-specific tests
cargo test --features=flightsql extension_cases::flightsql -- --test-threads=1
cargo test --features=s3 extension_cases::s3
cargo test --features=functions-json extension_cases::functions_json
cargo test --features=deltalake extension_cases::deltalake
cargo test --features="deltalake s3" extension_cases::deltalake::test_deltalake_s3 # Requires LocalStack
cargo test --features=udfs-wasm extension_cases::udfs_wasm
cargo test --features=vortex extension_cases::vortex
cargo test --features=vortex cli_cases::basic::test_output_vortex
# Run tests for specific crates
cargo test --manifest-path crates/datafusion-app/Cargo.toml --all-features
cargo test --manifest-path crates/datafusion-functions-parquet/Cargo.toml
cargo test --manifest-path crates/datafusion-udfs-wasm/Cargo.toml
# Run a single test
cargo test <test_name>Note: FlightSQL tests require --test-threads=1 because they spin up servers on the same port.
# Format code
cargo fmt --all
# Check formatting (CI check)
cargo fmt --all -- --check
# Run clippy
cargo clippy --all-features --workspace -- -D warnings
# Check for unused dependencies
cargo machete
# Format TOML files
taplo format --checkThe project is organized as a workspace with multiple crates:
-
Root crate (
datafusion-dft): Main binary and application logicsrc/main.rs- Entry point that routes to TUI, CLI, or serverssrc/tui/- TUI implementation using ratatuisrc/cli/- CLI implementationsrc/server/- HTTP and FlightSQL server implementationssrc/config.rs- Configuration managementsrc/args.rs- Command-line argument parsing
-
crates/datafusion-app: Core execution engine (reusable library)src/local.rs- ExecutionContext wrapping DataFusion SessionContextsrc/executor/- Dedicated executors for CPU-intensive work (inspired by InfluxDB)src/catalog/- Catalog managementsrc/extensions/- DataFusion extensionssrc/tables/- Table provider implementationssrc/stats.rs- Query execution statisticssrc/config.rs- Execution configuration
-
crates/datafusion-functions-parquet: Parquet-specific UDFs -
crates/datafusion-udfs-wasm: WASM-based UDF support -
crates/datafusion-auth: Authentication implementations -
crates/datafusion-ffi-table-providers: FFI table provider support
The ExecutionContext (in crates/datafusion-app/src/local.rs) is the core abstraction that wraps DataFusion's SessionContext with:
- Extension registration (UDFs, table formats, object stores)
- DDL file execution
- Dedicated executor for CPU-intensive work
- Query execution and statistics collection
- Observability integration
The TUI (in src/tui/) follows a state-based architecture:
src/tui/state/- Application state management with tab-specific statesrc/tui/ui/- Rendering logic separated from statesrc/tui/handlers/- Event handlingsrc/tui/execution.rs- Async query execution
Built with ratatui and crossterm.
- FlightSQL:
src/server/flightsql/- Arrow Flight SQL protocol server - HTTP:
src/server/http/- REST API using Axum
Both servers share the same ExecutionContext from datafusion-app.
The project uses extensive feature flags to keep binary size manageable:
tui- Terminal user interface (ratatui-based)s3- S3 object store integration (default)functions-parquet- Parquet-specific functions (default)functions-json- JSON functionsdeltalake- Delta Lake table format supportvortex- Vortex file format supportflightsql- FlightSQL server and clienthttp- HTTP serverhuggingface- HuggingFace dataset integrationudfs-wasm- WASM UDF supportobservability- Metrics and tracing (required by servers)
When adding code that depends on a feature, use conditional compilation:
#[cfg(feature = "flightsql")]
use datafusion_app::flightsql;Configuration files use TOML and are located in ~/.config/dft/. Key config files:
- Main config:
~/.config/dft/config.toml - DDL file:
~/.config/dft/ddl.sql(auto-loaded by TUI)
See src/config.rs and crates/datafusion-app/src/config.rs for configuration structure.
The project uses a dedicated executor pattern (inspired by InfluxDB) for CPU-intensive work. This separates network I/O (on the main Tokio runtime) from CPU-bound query execution. See crates/datafusion-app/src/executor/.
The main runtime in src/main.rs uses a single-threaded Tokio runtime optimized for network I/O.
- Update
Cargo.tomlfeature flags if needed - Add the feature to CI test matrix in
.github/workflows/test.yml - Implement feature in appropriate crate
- Add tests in the
extension_casesorcratetest suites - Update documentation
Some tests (S3, TUI, CLI, Delta Lake + S3) require LocalStack for S3 testing. The CI workflow shows the setup:
# Start LocalStack
localstack start -d
awslocal s3api create-bucket --bucket test --acl public-read
awslocal s3 mv data/aggregate_test_100.csv s3://test/
# Run S3 tests
cargo test --features=s3 extension_cases::s3
# For Delta Lake + S3 tests, also sync the delta lake data
awslocal s3 sync data/deltalake/simple_table s3://test/deltalake/simple_table
cargo test --features="deltalake s3" extension_cases::deltalake::test_deltalake_s3The project includes benchmarking support:
# Start HTTP server for benchmarking
just serve-http
# Run basic HTTP benchmark (requires oha tool)
just bench-http-basic
# Run custom benchmark
just bench-http-custom <file>Criterion benchmarks are available in crates/datafusion-app/benches/.
- Implement in
crates/datafusion-app/src/extensions/ - Register in the appropriate extension registration function
- Add tests with SQL queries exercising the UDF
- Implement in
crates/datafusion-app/src/tables/ - Register in catalog creation (
src/catalog/) - Add integration tests
The TUI uses a tab-based interface. Each tab has:
- State struct in
src/tui/state/tabs/ - UI rendering in
src/tui/ui/tabs/ - Event handlers in
src/tui/handlers/
When modifying TUI code, ensure proper separation between state management and rendering.
- The project is licensed under Apache 2.0
- Clippy lint
clone_on_ref_ptris set to "deny" - The main Tokio runtime should only be used for network I/O (single-threaded)
- CPU-intensive query execution uses dedicated executors
- FlightSQL tests must run with
--test-threads=1due to port conflicts - All server implementations share the same execution engine