AI Agent Instructions for mollerdb

This document provides comprehensive instructions for AI agents working on the mollerdb project. It summarizes the design, decisions, and current state of the project.

Last Updated: 2025-11-01 18:54:00 UTC Last User: wdconinc

1. Project Overview

The mollerdb project is a high-performance, dual-language Software Development Kit (SDK) for accessing the MOLLER experiment's analysis database. MOLLER (Measurement Of a Lepton Lepton Electroweak Reaction) is a precision physics experiment at Jefferson Lab that will measure parity-violating asymmetry in electron-electron scattering. This SDK provides convenient access to the experiment's database for collaborators who may not be proficient in SQL.

2. Core Design Decisions

2.1. Overall SDK Architecture

The SDK will support both C++ and Python users without duplicating core logic. The chosen architecture is:

C++ Core Library (libmollerdb): A central, high-performance C++ library that contains all database interaction logic.
Python Bindings: A thin wrapper around the C++ core, exposing its functionality to Python.

2.2. Key Technologies

Build System: scikit-build-core was chosen as the modern PEP 517 build backend. It will orchestrate the build process by invoking CMake. This replaces an earlier suggestion of using setuptools with a custom CMakeBuild class.
C++ Database Driver: sqlpp23 is the designated library for connecting to and interacting with the PostgreSQL database from C++. It was chosen over libpqxx because it is a header-only library with fewer dependencies, making it more platform-independent and easier to manage. It will be included as a git submodule.
Python Bindings: pybind11 is the standard tool chosen to create the bindings between C++ and Python.
Data Interchange Format: Apache Arrow was identified as the critical technology for efficient, zero-copy data transfer between the C++ core and Python. C++ functions will query the database and construct Arrow Table objects, which can be converted to Pandas DataFrames in Python with minimal overhead.

2.3. Naming and Structure

After some discussion, the following naming convention was finalized for consistency:

Repository: JeffersonLab/mollerdb
Python Package Name: mollerdb
Compiled Python Module: mollerdb (This is the .so or .pyd file generated by pybind11).
The final Python package structure will be located in python/mollerdb/.

3. Associated Database Schema Design

The SDK design was informed by a proposed redesign of the underlying database schema (qwparity_schema.dbml). Key schema suggestions include:

Unified Results Table: Merging detector-specific tables (md_data, lumi_data, beam) into a single, flexible results table.
Generalized Sensitivities Table: Abstracting "slopes" into a generic sensitivities table to store any linear correlation (detector-monitor, detector-detector, etc.). This requires a master quantity lookup table.
Versioning: Adding versioning (e.g., valid_from_run, valid_to_run) to detector and quantity tables to ensure long-term reproducibility.

4. Project Structure

Status: Schema integration is complete. The database schema from MOLLER-parity-schema is integrated as a git submodule and C++ headers are generated during the build process.

Documentation Structure:

Place topical documentation files in the docs/ directory
Keep high-level project overview in README.md
Keep agent-specific guidance in AGENTS.md
See docs/SCHEMA_INTEGRATION.md for detailed information on the schema integration approach

mollerdb/
├── include/           # C++ header files
├── src/              # C++ source files
│   └── Database.cpp  # Core database interaction logic
├── python/           # Python package
│   ├── mollerdb/     # Python package directory
│   │   └── __init__.py
│   └── bindings.cpp  # pybind11 bindings
├── thirdparty/       # Third-party dependencies (git submodules)
├── docs/             # Documentation (published to GitHub Pages)
├── CMakeLists.txt    # CMake build configuration
└── pyproject.toml    # Python package configuration

5. Development Guidelines

5.1. Building the Project

Python Package:

pip install -e .

C++ Library (via CMake):

mkdir build && cd build
cmake ..
make

5.2. Git Submodules

The project uses git submodules for dependencies like sqlpp23. Always ensure submodules are initialized:

git submodule update --init --recursive

5.3. Code Style

Follow the existing code style in each file
C++ code uses C++23 standard (as specified in CMakeLists.txt)
Python code should follow PEP 8 guidelines
Keep the C++ core focused on database operations
Keep Python bindings thin and delegate to C++ core

5.4. Data Flow

C++ functions query the database using sqlpp23
Results are converted to Apache Arrow Table objects
Arrow tables are passed to Python with zero-copy
Python users can convert Arrow tables to Pandas DataFrames

5.5. Naming Conventions

Python Package: mollerdb
Compiled Module: mollerdb
C++ Namespace: mollerdb
Use snake_case for Python code
Use camelCase for C++ code (following existing conventions)

6. Dependencies

6.1. C++ Dependencies

sqlpp23 (git submodule)
Apache Arrow C++ library
PostgreSQL client library (libpq)
pybind11

6.2. Python Dependencies

scikit-build-core (build)
pybind11 (build)
pyarrow (runtime - for Arrow table interaction)
pandas (optional - for DataFrame conversion in user code and examples)

7. CI/CD

The project uses GitHub Actions for continuous integration (see .github/workflows/). When modifying code:

Ensure builds pass on supported platforms
Git submodules must be checked out in CI workflows
Both C++ and Python components should be tested

8. Documentation

8.1. Documentation Structure

The project documentation is located in the docs/ directory and is published to GitHub Pages using Docsify. The documentation structure includes:

docs/README.md: Main documentation file with installation and usage instructions
docs/_sidebar.md: Sidebar navigation configuration
docs/index.html: Docsify configuration file
docs/.nojekyll: Disables Jekyll processing for GitHub Pages

8.2. Documentation Maintenance Guidelines

IMPORTANT: When developing new features or making changes, agents MUST keep the documentation up to date:

When Adding New Features:
- Update docs/README.md with usage examples for the new feature
- Add API reference documentation for new classes/methods
- Update installation instructions if new dependencies are required
- Add to the appropriate section in docs/_sidebar.md if creating new documentation pages
When Modifying Existing Features:
- Update all affected examples in docs/README.md
- Ensure API reference documentation reflects the changes
- Update any affected usage instructions
Periodic Documentation Review:
- Before completing major features, review the documentation for accuracy
- Verify that all code examples are correct and runnable
- Check that installation instructions are current
- Ensure API reference documentation matches the actual implementation
Documentation Testing:
- When making significant documentation changes, verify that:
  - All code examples are syntactically correct
  - Installation instructions work on supported platforms
  - Links to external resources are valid
  - The docsify sidebar navigation works correctly

The documentation is automatically deployed to GitHub Pages via the .github/workflows/pages.yml workflow when changes are pushed to the main branch.

9. Common Tasks

9.1. Adding a New Database Query Function

Implement the C++ function in src/Database.cpp
Use sqlpp23 for database interaction
Return results as Arrow Table objects
Expose the function in python/bindings.cpp using pybind11
Update Python package __init__.py if needed
Update documentation in docs/README.md with usage examples

9.2. Modifying Build Configuration

C++ build: Edit CMakeLists.txt
Python package: Edit pyproject.toml
Dependencies: Update both files as needed

9.3. Working with Arrow Tables

Construct Arrow tables in C++ using the Arrow C++ API
Pass table pointers through pybind11
Python receives Arrow tables that can be converted to Pandas DataFrames

10. Important Notes

Minimal Changes: Make the smallest possible changes to achieve goals
Testing: Run existing tests before and after changes
Dependencies: Avoid adding new dependencies unless absolutely necessary
Platform Independence: Keep code portable (Linux, macOS, Windows)
Performance: The C++ core is designed for high performance; maintain this in all changes
Documentation: Update docs/README.md and docstrings when changing public APIs

11. Security Considerations

Never commit database credentials or connection strings
Use environment variables for sensitive configuration
Validate all database inputs to prevent SQL injection (sqlpp23 provides protection)
Review dependencies for known vulnerabilities

12. Next Steps

The project is ready for ongoing development:

Implementing Core Logic: Flesh out the src/Database.cpp file to perform actual database queries using sqlpp23.
Integrating Apache Arrow: Add the logic to build Arrow Table objects from the query results and implement the C++-to-Python type conversions for these tables.
CI/CD Setup: Create a GitHub Actions workflow to build and test the C++ and Python components on various platforms, ensuring all dependencies (arrow) are correctly handled. The workflow must ensure git submodules are checked out to provide sqlpp23.
Documentation and Examples: Expand the docs/README.md and add an examples/ directory showing how to use the SDK in both Python and C++.

13. Contact

For questions about design decisions or project direction, contact the maintainers at wdconinc@jlab.org.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Agent Instructions for mollerdb

1. Project Overview

2. Core Design Decisions

2.1. Overall SDK Architecture

2.2. Key Technologies

2.3. Naming and Structure

3. Associated Database Schema Design

4. Project Structure

5. Development Guidelines

5.1. Building the Project

5.2. Git Submodules

5.3. Code Style

5.4. Data Flow

5.5. Naming Conventions

6. Dependencies

6.1. C++ Dependencies

6.2. Python Dependencies

7. CI/CD

8. Documentation

8.1. Documentation Structure

8.2. Documentation Maintenance Guidelines

9. Common Tasks

9.1. Adding a New Database Query Function

9.2. Modifying Build Configuration

9.3. Working with Arrow Tables

10. Important Notes

11. Security Considerations

12. Next Steps

13. Contact

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AI Agent Instructions for mollerdb

1. Project Overview

2. Core Design Decisions

2.1. Overall SDK Architecture

2.2. Key Technologies

2.3. Naming and Structure

3. Associated Database Schema Design

4. Project Structure

5. Development Guidelines

5.1. Building the Project

5.2. Git Submodules

5.3. Code Style

5.4. Data Flow

5.5. Naming Conventions

6. Dependencies

6.1. C++ Dependencies

6.2. Python Dependencies

7. CI/CD

8. Documentation

8.1. Documentation Structure

8.2. Documentation Maintenance Guidelines

9. Common Tasks

9.1. Adding a New Database Query Function

9.2. Modifying Build Configuration

9.3. Working with Arrow Tables

10. Important Notes

11. Security Considerations

12. Next Steps

13. Contact