Skip to content

Commit 64af65a

Browse files
CLAUDE.md (#340)
1 parent 8a62507 commit 64af65a

2 files changed

Lines changed: 251 additions & 0 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,3 +101,6 @@ queries/**
101101

102102
bench/url_files/*
103103
!bench/url_files/example_custom_bench.txt
104+
105+
# Tags
106+
tags

CLAUDE.md

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
`dft` (datafusion-dft) is a batteries-included suite of DataFusion applications providing four interfaces: TUI, CLI, FlightSQL Server, and HTTP Server. All interfaces share a common execution engine built on Apache DataFusion and Apache Arrow.
8+
9+
## Building and Running
10+
11+
### Development Commands
12+
13+
```bash
14+
# Build the project (default features: functions-parquet, s3)
15+
cargo build
16+
17+
# Build with all features
18+
cargo build --all-features
19+
20+
# Run the TUI (default interface)
21+
cargo run
22+
23+
# Run CLI with a query
24+
cargo run -- -c "SELECT 1 + 2"
25+
26+
# Start HTTP server (requires http feature)
27+
cargo run --features=http -- serve-http
28+
29+
# Start FlightSQL server (requires flightsql feature)
30+
cargo run --features=flightsql -- serve-flightsql
31+
32+
# Generate TPC-H data
33+
cargo run -- generate-tpch
34+
```
35+
36+
### Testing
37+
38+
Tests are organized by feature and component:
39+
40+
```bash
41+
# Run core database tests
42+
cargo test db
43+
44+
# Run CLI tests
45+
cargo test cli_cases
46+
47+
# Run TUI tests
48+
cargo test tui_cases
49+
50+
# Run feature-specific tests
51+
cargo test --features=flightsql extension_cases::flightsql -- --test-threads=1
52+
cargo test --features=s3 extension_cases::s3
53+
cargo test --features=functions-json extension_cases::functions_json
54+
cargo test --features=deltalake extension_cases::deltalake
55+
cargo test --features=udfs-wasm extension_cases::udfs_wasm
56+
57+
# Run tests for specific crates
58+
cargo test --manifest-path crates/datafusion-app/Cargo.toml --all-features
59+
cargo test --manifest-path crates/datafusion-functions-parquet/Cargo.toml
60+
cargo test --manifest-path crates/datafusion-udfs-wasm/Cargo.toml
61+
62+
# Run a single test
63+
cargo test <test_name>
64+
```
65+
66+
Note: FlightSQL tests require `--test-threads=1` because they spin up servers on the same port.
67+
68+
### Code Quality
69+
70+
```bash
71+
# Format code
72+
cargo fmt --all
73+
74+
# Check formatting (CI check)
75+
cargo fmt --all -- --check
76+
77+
# Run clippy
78+
cargo clippy --all-features --workspace -- -D warnings
79+
80+
# Check for unused dependencies
81+
cargo machete
82+
83+
# Format TOML files
84+
taplo format --check
85+
```
86+
87+
## Architecture
88+
89+
### Crate Structure
90+
91+
The project is organized as a workspace with multiple crates:
92+
93+
- **Root crate (`datafusion-dft`)**: Main binary and application logic
94+
- `src/main.rs` - Entry point that routes to TUI, CLI, or servers
95+
- `src/tui/` - TUI implementation using ratatui
96+
- `src/cli/` - CLI implementation
97+
- `src/server/` - HTTP and FlightSQL server implementations
98+
- `src/config.rs` - Configuration management
99+
- `src/args.rs` - Command-line argument parsing
100+
101+
- **`crates/datafusion-app`**: Core execution engine (reusable library)
102+
- `src/local.rs` - ExecutionContext wrapping DataFusion SessionContext
103+
- `src/executor/` - Dedicated executors for CPU-intensive work (inspired by InfluxDB)
104+
- `src/catalog/` - Catalog management
105+
- `src/extensions/` - DataFusion extensions
106+
- `src/tables/` - Table provider implementations
107+
- `src/stats.rs` - Query execution statistics
108+
- `src/config.rs` - Execution configuration
109+
110+
- **`crates/datafusion-functions-parquet`**: Parquet-specific UDFs
111+
112+
- **`crates/datafusion-udfs-wasm`**: WASM-based UDF support
113+
114+
- **`crates/datafusion-auth`**: Authentication implementations
115+
116+
- **`crates/datafusion-ffi-table-providers`**: FFI table provider support
117+
118+
### Key Components
119+
120+
#### ExecutionContext
121+
The `ExecutionContext` (in `crates/datafusion-app/src/local.rs`) is the core abstraction that wraps DataFusion's `SessionContext` with:
122+
- Extension registration (UDFs, table formats, object stores)
123+
- DDL file execution
124+
- Dedicated executor for CPU-intensive work
125+
- Query execution and statistics collection
126+
- Observability integration
127+
128+
#### TUI Architecture
129+
The TUI (in `src/tui/`) follows a state-based architecture:
130+
- `src/tui/state/` - Application state management with tab-specific state
131+
- `src/tui/ui/` - Rendering logic separated from state
132+
- `src/tui/handlers/` - Event handling
133+
- `src/tui/execution.rs` - Async query execution
134+
135+
Built with ratatui and crossterm.
136+
137+
#### Server Implementations
138+
- **FlightSQL**: `src/server/flightsql/` - Arrow Flight SQL protocol server
139+
- **HTTP**: `src/server/http/` - REST API using Axum
140+
141+
Both servers share the same `ExecutionContext` from `datafusion-app`.
142+
143+
### Feature Flags
144+
145+
The project uses extensive feature flags to keep binary size manageable:
146+
147+
- `s3` - S3 object store integration (default)
148+
- `functions-parquet` - Parquet-specific functions (default)
149+
- `functions-json` - JSON functions
150+
- `deltalake` - Delta Lake table format support
151+
- `flightsql` - FlightSQL server and client
152+
- `http` - HTTP server
153+
- `huggingface` - HuggingFace dataset integration
154+
- `udfs-wasm` - WASM UDF support
155+
- `observability` - Metrics and tracing (required by servers)
156+
157+
When adding code that depends on a feature, use conditional compilation:
158+
```rust
159+
#[cfg(feature = "flightsql")]
160+
use datafusion_app::flightsql;
161+
```
162+
163+
### Configuration
164+
165+
Configuration files use TOML and are located in `~/.config/dft/`. Key config files:
166+
- Main config: `~/.config/dft/config.toml`
167+
- DDL file: `~/.config/dft/ddl.sql` (auto-loaded by TUI)
168+
169+
See `src/config.rs` and `crates/datafusion-app/src/config.rs` for configuration structure.
170+
171+
### Executor Pattern
172+
173+
The project uses a dedicated executor pattern (inspired by InfluxDB) for CPU-intensive work. This separates network I/O (on the main Tokio runtime) from CPU-bound query execution. See `crates/datafusion-app/src/executor/`.
174+
175+
The main runtime in `src/main.rs` uses a single-threaded Tokio runtime optimized for network I/O.
176+
177+
## Development Workflow
178+
179+
### Adding New Features
180+
181+
1. Update `Cargo.toml` feature flags if needed
182+
2. Add the feature to CI test matrix in `.github/workflows/test.yml`
183+
3. Implement feature in appropriate crate
184+
4. Add tests in the `extension_cases` or `crate` test suites
185+
5. Update documentation
186+
187+
### Testing Against LocalStack
188+
189+
Some tests (S3, TUI, CLI) require LocalStack for S3 testing. The CI workflow shows the setup:
190+
191+
```bash
192+
# Start LocalStack
193+
localstack start -d
194+
awslocal s3api create-bucket --bucket tmp --acl public-read
195+
awslocal s3 mv data/aggregate_test_100.csv s3://tmp/
196+
197+
# Run tests
198+
cargo test --features=s3 extension_cases::s3
199+
```
200+
201+
### Benchmarking
202+
203+
The project includes benchmarking support:
204+
205+
```bash
206+
# Start HTTP server for benchmarking
207+
just serve-http
208+
209+
# Run basic HTTP benchmark (requires oha tool)
210+
just bench-http-basic
211+
212+
# Run custom benchmark
213+
just bench-http-custom <file>
214+
```
215+
216+
Criterion benchmarks are available in `crates/datafusion-app/benches/`.
217+
218+
## Common Patterns
219+
220+
### Adding a New UDF
221+
222+
1. Implement in `crates/datafusion-app/src/extensions/`
223+
2. Register in the appropriate extension registration function
224+
3. Add tests with SQL queries exercising the UDF
225+
226+
### Adding Table Provider Support
227+
228+
1. Implement in `crates/datafusion-app/src/tables/`
229+
2. Register in catalog creation (`src/catalog/`)
230+
3. Add integration tests
231+
232+
### Working with the TUI
233+
234+
The TUI uses a tab-based interface. Each tab has:
235+
- State struct in `src/tui/state/tabs/`
236+
- UI rendering in `src/tui/ui/tabs/`
237+
- Event handlers in `src/tui/handlers/`
238+
239+
When modifying TUI code, ensure proper separation between state management and rendering.
240+
241+
## Important Notes
242+
243+
- The project is licensed under Apache 2.0
244+
- Clippy lint `clone_on_ref_ptr` is set to "deny"
245+
- The main Tokio runtime should only be used for network I/O (single-threaded)
246+
- CPU-intensive query execution uses dedicated executors
247+
- FlightSQL tests must run with `--test-threads=1` due to port conflicts
248+
- All server implementations share the same execution engine

0 commit comments

Comments
 (0)