Skip to content

Latest commit

 

History

History
187 lines (139 loc) · 5.34 KB

File metadata and controls

187 lines (139 loc) · 5.34 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

VTS (Vector Transport Service) is an open-source tool for moving vectors and unstructured data, built on Apache SeaTunnel. It provides data synchronization capabilities between various vector databases, traditional search engines, and data stores.

Build Commands

Maven Build (mvnd recommended for faster builds)

# Build distribution package (fastest, recommended)
mvnd clean package -pl :seatunnel-dist -am -D"skip.ui"=true -Dmaven.test.skip=true -Prelease

# Build the entire project
mvnd clean compile -Dmaven.test.skip=true

# Build with tests (if needed)
mvnd clean test -Dmaven.test.skip=false

# Build specific module
mvnd clean package -pl seatunnel-connectors-v2/connector-milvus -am -Dmaven.test.skip=true

# Using standard Maven (slower)
./mvnw clean package -pl :seatunnel-dist -am -D"skip.ui"=true -Dmaven.test.skip=true -Prelease

SeaTunnel Engine Commands

# Start cluster mode (recommended)
mkdir -p ./logs
./bin/seatunnel-cluster.sh -d

# Run job in cluster mode
./bin/seatunnel.sh --config ./path/to/config.conf

# Run job in local mode
./bin/seatunnel.sh --config ./path/to/config.conf -m local

# Stop cluster
./bin/stop-seatunnel-cluster.sh

UI Development (seatunnel-engine-ui)

cd seatunnel-engine/seatunnel-engine-ui

# Install dependencies
npm install

# Run development server
npm run dev

# Build for production
npm run build

# Run tests
npm run test:unit

# Run E2E tests
npm run test:e2e

# Lint code
npm run lint

# Format code
npm run format

Testing

Unit Tests

# Run all unit tests
./mvnw test

# Run tests for specific module
./mvnw test -pl seatunnel-engine/seatunnel-engine-core

# Run specific test class
./mvnw test -Dtest=ClassName

# Skip unit tests
./mvnw compile -DskipUT=true

Integration Tests

# Run integration tests (disabled by default)
./mvnw verify -DskipIT=false

# Run E2E tests for connectors
./mvnw test -pl seatunnel-e2e/seatunnel-connector-v2-e2e

Test Frameworks

  • JUnit 5 - Primary testing framework
  • Mockito - Mocking framework
  • TestContainers - Integration testing with Docker containers
  • Awaitility - Asynchronous testing utilities

Code Quality

Code Formatting

# Check code style with Spotless
./mvnw spotless:check

# Apply code formatting
./mvnw spotless:apply

Pre-commit Hooks

# Run pre-commit checks
./tools/spotless_check/pre-commit.sh

Architecture Overview

Core Modules

  • seatunnel-api - Core API definitions and interfaces
  • seatunnel-engine - SeaTunnel execution engine (cluster management, job execution)
  • seatunnel-connectors-v2 - Data connectors (Milvus, Elasticsearch, Pinecone, etc.)
  • seatunnel-transforms-v2 - Data transformation components
  • seatunnel-translation - Translation layer between APIs and execution engines

Engine Components

  • seatunnel-engine-core - Core engine logic and execution
  • seatunnel-engine-server - Server-side cluster management
  • seatunnel-engine-client - Client API for job submission
  • seatunnel-engine-ui - Web-based management interface (Vue.js)

Data Flow

  1. Source Connectors - Read data from various sources (Milvus, ES, etc.)
  2. Transforms - Process and transform data (field mapping, embedding, etc.)
  3. Sink Connectors - Write data to target systems

Connector Architecture

  • connector-common - Shared connector utilities
  • Vector-specific connectors - Milvus, Pinecone, Qdrant, etc.
  • Traditional connectors - Elasticsearch, PostgreSQL, etc.

Development Workflow

Adding New Connectors

  1. Extend base connector classes in seatunnel-connectors-v2/connector-common
  2. Implement source/sink interfaces from seatunnel-api
  3. Add configuration schemas and validation
  4. Create unit tests in src/test/java
  5. Add E2E tests in seatunnel-e2e/seatunnel-connector-v2-e2e
  6. Update documentation and examples

Configuration Files

  • Job configurations use HOCON format (.conf files)
  • Examples located in seatunnel-examples/seatunnel-engine-examples/src/main/resources/examples/
  • Engine configuration in config/seatunnel.yaml

Key Patterns

  • Factory Pattern - Connector creation via factory classes
  • Plugin Discovery - Dynamic loading of connectors via seatunnel-plugin-discovery
  • Checkpoint/Savepoint - State management for fault tolerance
  • Pipeline Execution - Minimum granularity for job execution

Common Issues

Memory Configuration

  • Adjust JVM options in config/jvm_options, config/jvm_master_options, config/jvm_worker_options
  • Configure parallelism in job configuration based on data volume

Performance Tuning

  • Set appropriate batch_size in connector configurations
  • Configure parallelism in job environment settings
  • Monitor resource usage during data migration

Vector Database Specific

  • Ensure Milvus version >= 2.3.6 for compatibility
  • Configure proper authentication tokens and connection parameters
  • Handle vector dimension mismatches in transformations

File Structure Notes

  • Configuration files use YAML/HOCON format
  • Java follows Google Java Format (AOSP style)
  • Vue.js components follow standard Vue 3 + TypeScript patterns
  • Test files use *Test.java and *IT.java naming conventions