Contributing to DNALLM

Thank you for your interest in contributing to DNALLM! This document provides guidelines and information for contributors.

Code of Conduct
Getting Started
Development Setup
Contributing Guidelines
Code Style and Standards
Testing
Documentation
Pull Request Process
Issue Reporting
Release Process

Code of Conduct

This project adheres to a code of conduct that we expect all contributors to follow. Please be respectful and inclusive in all interactions.

Getting Started

Prerequisites

Python 3.10 or higher
Git
uv package manager (recommended)
CUDA-compatible GPU (optional, for GPU acceleration)

Fork and Clone

Fork the repository on GitHub

Clone your fork locally:

git clone https://github.com/your-username/DNALLM.git
cd DNALLM

Add the upstream repository:

git remote add upstream https://github.com/zhangtaolab/DNALLM.git

Development Setup

1. Environment Setup

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv

# Activate virtual environment
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

# Install DNALLM in development mode
uv pip install -e '.[dev,test]'

2. Pre-commit Hooks (Optional)

# Install pre-commit
uv pip install pre-commit

# Install pre-commit hooks
pre-commit install

3. Verify Installation

# Run tests to verify everything works
pytest tests/ -v

# Check code quality
black --check .
isort --check-only .
flake8 .
mypy dnallm/

Contributing Guidelines

Types of Contributions

We welcome several types of contributions:

Bug fixes: Fix issues in existing code
New features: Add new functionality
Documentation: Improve or add documentation
Tests: Add or improve test coverage
Performance: Optimize existing code
Examples: Add new examples or tutorials

Workflow

Create a branch:

git checkout -b feature/your-feature-name
# or
git checkout -b fix/issue-description

Make your changes:
- Write clean, well-documented code
- Add tests for new functionality
- Update documentation as needed

Test your changes:

# Run all tests
pytest tests/ -v

# Run specific test categories
pytest tests/ -m "not slow"
pytest tests/ -m "unit"

# Run with coverage
pytest tests/ --cov=dnallm --cov-report=term-missing

Check code quality:

# Format code with Ruff
ruff format .

# Check code formatting
ruff format --check .

# Lint code with Ruff
ruff check . --statistics

# Additional flake8 check for MCP module
flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402

# Type checking (relaxed settings)
mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped

Commit your changes:

git add .
git commit -m "Add: brief description of changes"

Push and create PR:

git push origin feature/your-feature-name

Code Style and Standards

Python Code Style

We follow the following standards:

Ruff: Code formatting and linting (line length: 79 characters)
flake8: Additional linting for MCP module compatibility
mypy: Type checking (with relaxed settings for development)

Pre-commit Checklist

⚠️ IMPORTANT: Before committing any code, you MUST run the following checks:

Code Formatting:

# Format all code
ruff format .

# Verify formatting is correct
ruff format --check .

Code Quality Checks:

# Run Ruff linting
ruff check . --statistics

# Run flake8 for MCP module
flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402

Type Checking:

# Run mypy with relaxed settings
mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped

Test Suite:

# Run all tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=dnallm --cov-report=term-missing

Quick Validation Script:

# Option 1: Use the automated check script (recommended)
python scripts/check_code.py

# Option 2: Run all checks manually in one command
ruff format --check . && ruff check . --statistics && flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 && mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped

All checks must pass before committing! CI will run the same checks and will fail if any issues are found.

Code Check Scripts

We provide automated scripts to run all code quality checks at once:

Python Script (Cross-platform, Recommended)

# Basic usage - run code quality checks only (default)
python scripts/check_code.py

# Auto-fix issues where possible
python scripts/check_code.py --fix

# Show detailed output
python scripts/check_code.py --verbose

# Include test suite execution
python scripts/check_code.py --with-tests

# Get help
python scripts/check_code.py --help

Shell Script (Linux/macOS)

# Make executable (first time only)
chmod +x scripts/check_code.sh

# Run all checks
./scripts/check_code.sh

# Auto-fix issues
./scripts/check_code.sh --fix

# Verbose output
./scripts/check_code.sh --verbose

Batch Script (Windows)

# Run all checks
scripts\check_code.bat

# Auto-fix issues
scripts\check_code.bat --fix

# Verbose output
scripts\check_code.bat --verbose

What the Scripts Check

Code Formatting (Ruff)
Code Quality (Ruff linting)
MCP Module Compatibility (Flake8)
Type Checking (MyPy with relaxed settings)
Test Suite (Pytest)
Test Coverage (Pytest with coverage)

Example Usage

# Quick code quality check (default - no tests)
$ python scripts/check_code.py
[INFO] Starting DNALLM code quality checks...
==========================================
[INFO] 1. Code Formatting...
[SUCCESS] Code formatting check completed successfully

[INFO] 2. Code Quality (Ruff)...
[SUCCESS] Code quality check completed successfully

[INFO] 3. Flake8 (MCP Module)...
[SUCCESS] Flake8 check for MCP module completed successfully

[INFO] 4. Type Checking (MyPy)...
[SUCCESS] Type checking with MyPy completed successfully

==========================================
[SUCCESS] All checks passed! ✅
[INFO] Your code is ready for commit.

# Include test suite execution
$ python scripts/check_code.py --with-tests
[INFO] Starting DNALLM code quality checks...
[INFO] 1. Code Formatting...
[SUCCESS] Code formatting check completed successfully
...
[INFO] 5. Test Suite...
[SUCCESS] Test suite execution completed successfully

# Auto-fix issues
$ python scripts/check_code.py --fix
[INFO] Starting DNALLM code quality checks...
[INFO] 1. Code Formatting...
[SUCCESS] Code formatting (auto-fix) completed successfully
...

Code Quality Standards

Ruff Configuration

Line length: 79 characters (not 88 like Black)
Indentation: 4 spaces
Quote style: Double quotes
Import sorting: Automatic with isort compatibility
Error codes enabled: E4, E7, E9, F, W, B, C4, UP, N, S, T20, PT, Q, RUF

Flake8 Configuration (MCP Module)

Line length: 79 characters
Ignored errors: E203, W503, C901, E402
Purpose: Ensure MCP module compatibility

MyPy Configuration

Strict mode: Disabled for development
Missing imports: Ignored
Optional types: Not strictly enforced
Disabled error codes: Multiple codes disabled for development flexibility

File Organization

Maximum file size: < 1000 lines
Import order: Standard library → Third party → Local imports
Docstring style: Google-style
Type hints: Required for all function parameters and return values

Naming Conventions

Functions and variables: snake_case
Classes: PascalCase
Constants: UPPER_SNAKE_CASE
Private methods: _leading_underscore

Documentation

Use Google-style docstrings for functions and classes
Include type hints for all function parameters and return values
Add inline comments for complex logic

Example:

def infer_sequence(self, sequence: str, model_name: str) -> Dict[str, Any]:
    """Infer the properties of a DNA sequence.

    Args:
        sequence: DNA sequence string (A, T, G, C)
        model_name: Name of the model to use for inference

    Returns:
        Dictionary containing inference results and metadata

    Raises:
        ValueError: If sequence contains invalid characters
        ModelNotFoundError: If specified model is not available
    """
    # Implementation here
    pass

File Organization

Keep files focused and reasonably sized (< 1000 lines)
Use meaningful file and directory names
Group related functionality together
Follow the existing project structure

Testing

Test Structure

Tests are organized in the tests/ directory:

tests/
├── pytest.ini              # Test configuration
├── TESTING.md              # Detailed testing guide
├── inference/              # Inference module tests
├── utils/                  # Utility function tests
├── datahandling/           # Data handling tests
├── finetune/               # Training tests
└── test_data/              # Test data files

Writing Tests

Test file naming: test_*.py
Test class naming: Test*
Test method naming: test_*
Use descriptive test names that explain what is being tested

Example:

import pytest
from dnallm.utils.sequence import validate_dna_sequence


class TestSequenceValidation:
    """Test cases for DNA sequence validation."""

    def test_valid_sequence(self):
        """Test validation of valid DNA sequences."""
        assert validate_dna_sequence("ATCG") == True
        assert validate_dna_sequence("ATCGATCG") == True

    def test_invalid_characters(self):
        """Test validation rejects invalid characters."""
        with pytest.raises(ValueError):
            validate_dna_sequence("ATCGX")

    @pytest.mark.slow
    def test_large_sequence(self):
        """Test validation of large sequences."""
        large_seq = "ATCG" * 1000
        assert validate_dna_sequence(large_seq) == True

Test Markers

Use appropriate markers for different test types:

@pytest.mark.slow: Long-running tests
@pytest.mark.integration: Integration tests
@pytest.mark.unit: Unit tests
@pytest.mark.pdf: Tests that generate PDF files

Running Tests

# Run all tests
pytest

# Run specific test categories
pytest -m "not slow"          # Skip slow tests
pytest -m "unit"              # Only unit tests
pytest -m "integration"       # Only integration tests

# Run with coverage
pytest --cov=dnallm --cov-report=html

# Run specific test files
pytest tests/inference/test_inference.py

Documentation

Documentation Structure

Documentation is organized in the docs/ directory:

docs/
├── index.md                 # Main documentation page
├── getting_started/         # Installation and setup guides
├── tutorials/              # Step-by-step tutorials
├── api/                    # API reference
├── concepts/               # Core concepts
└── faq/                    # Frequently asked questions

Writing Documentation

Use Markdown for all documentation
Include code examples that can be run
Keep documentation up-to-date with code changes
Use clear, concise language
Include diagrams for complex concepts

Building Documentation

# Install documentation dependencies
uv pip install -e '.[docs]'

# Build documentation
mkdocs build

# Serve documentation locally
mkdocs serve

Pull Request Process

Before Submitting

Run the complete pre-commit checklist (see Pre-commit Checklist above):

# Quick validation - all checks must pass
ruff format --check . && ruff check . --statistics && flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 && mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped

Ensure all tests pass:

pytest tests/ -v --cov=dnallm --cov-report=term-missing

Update documentation if needed
Add tests for new functionality
Update CHANGELOG.md if applicable
Verify CI compatibility: Your local checks should match what CI runs

PR Description Template

## Description
Brief description of changes

## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Test addition/improvement

## Testing
- [ ] All existing tests pass
- [ ] New tests added for new functionality
- [ ] Manual testing completed

## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No breaking changes (or clearly documented)

## Related Issues
Closes #issue_number

Review Process

Automated checks must pass (CI/CD)
Code review by maintainers
Testing on different environments
Documentation review

Issue Reporting

Bug Reports

When reporting bugs, please include:

Clear description of the issue
Steps to reproduce
Expected vs actual behavior
Environment information:
- Python version
- Operating system
- DNALLM version
- Dependencies versions
Minimal code example (if applicable)
Error messages and stack traces

Feature Requests

For feature requests, please include:

Clear description of the feature
Use case and motivation
Proposed implementation (if you have ideas)
Alternatives considered

Issue Templates

Use the provided issue templates when creating issues on GitHub.

Release Process

Version Numbering

We follow Semantic Versioning:

MAJOR: Incompatible API changes
MINOR: New functionality (backward compatible)
PATCH: Bug fixes (backward compatible)

Release Checklist

Update version in pyproject.toml
Update CHANGELOG.md
Run full test suite
Update documentation
Create release tag
Build and publish to PyPI

Development Tips

Performance Optimization

Use profiling tools to identify bottlenecks
Consider vectorization for numerical operations
Implement caching for expensive computations
Use async/await for I/O operations

Memory Management

Monitor memory usage during development
Clean up resources properly
Use context managers for file operations
Consider lazy loading for large datasets

Debugging

Use logging instead of print statements
Add debugging information to error messages
Use breakpoints in development
Test edge cases thoroughly

Quick Reference

Essential Commands

# Setup development environment
uv venv
source .venv/bin/activate
uv pip install -e '.[dev,test]'

# Pre-commit validation (run before every commit)
# Option 1: Use automated script (recommended, code quality only)
python scripts/check_code.py

# Option 2: Include test suite execution
python scripts/check_code.py --with-tests

# Option 3: Manual validation
ruff format --check . && ruff check . --statistics && flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 && mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped

# Auto-fix code issues
python scripts/check_code.py --fix

# Format code manually
ruff format .

# Run tests
pytest tests/ -v --cov=dnallm --cov-report=term-missing

# Build documentation
mkdocs build
mkdocs serve

Common Issues and Solutions

Code check script fails:
- Make sure you're in the DNALLM root directory
- Ensure virtual environment is activated: source .venv/bin/activate
- Install dependencies: uv pip install -e '.[dev,test]'
Ruff formatting errors:
- Auto-fix: python scripts/check_code.py --fix
- Manual fix: ruff format .
Import errors:
- Check import order and use # noqa: E402 for necessary late imports
- Run ruff check . --fix to auto-fix some import issues
Type checking errors:
- Most are disabled in development, but fix critical ones
- Check specific files: mypy dnallm/specific_file.py
Test failures:
- Run specific test files: pytest tests/specific_test.py -v
- Run with verbose output: python scripts/check_code.py --verbose
Script permission errors (Linux/macOS):
- Make executable: chmod +x scripts/check_code.sh
- Or use Python script: python scripts/check_code.py

Getting Help

GitHub Issues: For bug reports and feature requests
GitHub Discussions: For questions and general discussion
Documentation: Check the docs/ directory first
Examples: Look at the example/ directory for usage patterns

Recognition

Contributors will be recognized in:

CONTRIBUTORS.md file
Release notes
Project documentation

Thank you for contributing to DNALLM! 🧬

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to DNALLM

Table of Contents

Code of Conduct

Getting Started

Prerequisites

Fork and Clone

Development Setup

1. Environment Setup

2. Pre-commit Hooks (Optional)

3. Verify Installation

Contributing Guidelines

Types of Contributions

Workflow

Code Style and Standards

Python Code Style

Pre-commit Checklist

Code Check Scripts

Python Script (Cross-platform, Recommended)

Shell Script (Linux/macOS)

Batch Script (Windows)

What the Scripts Check

Example Usage

Code Quality Standards

Ruff Configuration

Flake8 Configuration (MCP Module)

MyPy Configuration

File Organization

Naming Conventions

Documentation

File Organization

Testing

Test Structure

Writing Tests

Test Markers

Running Tests

Documentation

Documentation Structure

Writing Documentation

Building Documentation

Pull Request Process

Before Submitting

PR Description Template

Review Process

Issue Reporting

Bug Reports

Feature Requests

Issue Templates

Release Process

Version Numbering

Release Checklist

Development Tips

Performance Optimization

Memory Management

Debugging

Quick Reference

Essential Commands

Common Issues and Solutions

Getting Help

Recognition