Thank you for your interest in contributing to DNALLM! This document provides guidelines and information for contributors.
- Code of Conduct
- Getting Started
- Development Setup
- Contributing Guidelines
- Code Style and Standards
- Testing
- Documentation
- Pull Request Process
- Issue Reporting
- Release Process
This project adheres to a code of conduct that we expect all contributors to follow. Please be respectful and inclusive in all interactions.
- Python 3.10 or higher
- Git
- uv package manager (recommended)
- CUDA-compatible GPU (optional, for GPU acceleration)
-
Fork the repository on GitHub
-
Clone your fork locally:
git clone https://github.com/your-username/DNALLM.git cd DNALLM -
Add the upstream repository:
git remote add upstream https://github.com/zhangtaolab/DNALLM.git
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv
# Activate virtual environment
source .venv/bin/activate # Linux/macOS
# or
.venv\Scripts\activate # Windows
# Install DNALLM in development mode
uv pip install -e '.[dev,test]'# Install pre-commit
uv pip install pre-commit
# Install pre-commit hooks
pre-commit install# Run tests to verify everything works
pytest tests/ -v
# Check code quality
black --check .
isort --check-only .
flake8 .
mypy dnallm/We welcome several types of contributions:
- Bug fixes: Fix issues in existing code
- New features: Add new functionality
- Documentation: Improve or add documentation
- Tests: Add or improve test coverage
- Performance: Optimize existing code
- Examples: Add new examples or tutorials
-
Create a branch:
git checkout -b feature/your-feature-name # or git checkout -b fix/issue-description -
Make your changes:
- Write clean, well-documented code
- Add tests for new functionality
- Update documentation as needed
-
Test your changes:
# Run all tests pytest tests/ -v # Run specific test categories pytest tests/ -m "not slow" pytest tests/ -m "unit" # Run with coverage pytest tests/ --cov=dnallm --cov-report=term-missing
-
Check code quality:
# Format code with Ruff ruff format . # Check code formatting ruff format --check . # Lint code with Ruff ruff check . --statistics # Additional flake8 check for MCP module flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 # Type checking (relaxed settings) mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped
-
Commit your changes:
git add . git commit -m "Add: brief description of changes"
-
Push and create PR:
git push origin feature/your-feature-name
We follow the following standards:
- Ruff: Code formatting and linting (line length: 79 characters)
- flake8: Additional linting for MCP module compatibility
- mypy: Type checking (with relaxed settings for development)
-
Code Formatting:
# Format all code ruff format . # Verify formatting is correct ruff format --check .
-
Code Quality Checks:
# Run Ruff linting ruff check . --statistics # Run flake8 for MCP module flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402
-
Type Checking:
# Run mypy with relaxed settings mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped -
Test Suite:
# Run all tests pytest tests/ -v # Run tests with coverage pytest tests/ --cov=dnallm --cov-report=term-missing
-
Quick Validation Script:
# Option 1: Use the automated check script (recommended) python scripts/check_code.py # Option 2: Run all checks manually in one command ruff format --check . && ruff check . --statistics && flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 && mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped
All checks must pass before committing! CI will run the same checks and will fail if any issues are found.
We provide automated scripts to run all code quality checks at once:
# Basic usage - run code quality checks only (default)
python scripts/check_code.py
# Auto-fix issues where possible
python scripts/check_code.py --fix
# Show detailed output
python scripts/check_code.py --verbose
# Include test suite execution
python scripts/check_code.py --with-tests
# Get help
python scripts/check_code.py --help# Make executable (first time only)
chmod +x scripts/check_code.sh
# Run all checks
./scripts/check_code.sh
# Auto-fix issues
./scripts/check_code.sh --fix
# Verbose output
./scripts/check_code.sh --verbose# Run all checks
scripts\check_code.bat
# Auto-fix issues
scripts\check_code.bat --fix
# Verbose output
scripts\check_code.bat --verbose- Code Formatting (Ruff)
- Code Quality (Ruff linting)
- MCP Module Compatibility (Flake8)
- Type Checking (MyPy with relaxed settings)
- Test Suite (Pytest)
- Test Coverage (Pytest with coverage)
# Quick code quality check (default - no tests)
$ python scripts/check_code.py
[INFO] Starting DNALLM code quality checks...
==========================================
[INFO] 1. Code Formatting...
[SUCCESS] Code formatting check completed successfully
[INFO] 2. Code Quality (Ruff)...
[SUCCESS] Code quality check completed successfully
[INFO] 3. Flake8 (MCP Module)...
[SUCCESS] Flake8 check for MCP module completed successfully
[INFO] 4. Type Checking (MyPy)...
[SUCCESS] Type checking with MyPy completed successfully
==========================================
[SUCCESS] All checks passed! ✅
[INFO] Your code is ready for commit.
# Include test suite execution
$ python scripts/check_code.py --with-tests
[INFO] Starting DNALLM code quality checks...
[INFO] 1. Code Formatting...
[SUCCESS] Code formatting check completed successfully
...
[INFO] 5. Test Suite...
[SUCCESS] Test suite execution completed successfully
# Auto-fix issues
$ python scripts/check_code.py --fix
[INFO] Starting DNALLM code quality checks...
[INFO] 1. Code Formatting...
[SUCCESS] Code formatting (auto-fix) completed successfully
...- Line length: 79 characters (not 88 like Black)
- Indentation: 4 spaces
- Quote style: Double quotes
- Import sorting: Automatic with isort compatibility
- Error codes enabled: E4, E7, E9, F, W, B, C4, UP, N, S, T20, PT, Q, RUF
- Line length: 79 characters
- Ignored errors: E203, W503, C901, E402
- Purpose: Ensure MCP module compatibility
- Strict mode: Disabled for development
- Missing imports: Ignored
- Optional types: Not strictly enforced
- Disabled error codes: Multiple codes disabled for development flexibility
- Maximum file size: < 1000 lines
- Import order: Standard library → Third party → Local imports
- Docstring style: Google-style
- Type hints: Required for all function parameters and return values
- Functions and variables:
snake_case - Classes:
PascalCase - Constants:
UPPER_SNAKE_CASE - Private methods:
_leading_underscore
- Use Google-style docstrings for functions and classes
- Include type hints for all function parameters and return values
- Add inline comments for complex logic
Example:
def infer_sequence(self, sequence: str, model_name: str) -> Dict[str, Any]:
"""Infer the properties of a DNA sequence.
Args:
sequence: DNA sequence string (A, T, G, C)
model_name: Name of the model to use for inference
Returns:
Dictionary containing inference results and metadata
Raises:
ValueError: If sequence contains invalid characters
ModelNotFoundError: If specified model is not available
"""
# Implementation here
pass- Keep files focused and reasonably sized (< 1000 lines)
- Use meaningful file and directory names
- Group related functionality together
- Follow the existing project structure
Tests are organized in the tests/ directory:
tests/
├── pytest.ini # Test configuration
├── TESTING.md # Detailed testing guide
├── inference/ # Inference module tests
├── utils/ # Utility function tests
├── datahandling/ # Data handling tests
├── finetune/ # Training tests
└── test_data/ # Test data files
- Test file naming:
test_*.py - Test class naming:
Test* - Test method naming:
test_* - Use descriptive test names that explain what is being tested
Example:
import pytest
from dnallm.utils.sequence import validate_dna_sequence
class TestSequenceValidation:
"""Test cases for DNA sequence validation."""
def test_valid_sequence(self):
"""Test validation of valid DNA sequences."""
assert validate_dna_sequence("ATCG") == True
assert validate_dna_sequence("ATCGATCG") == True
def test_invalid_characters(self):
"""Test validation rejects invalid characters."""
with pytest.raises(ValueError):
validate_dna_sequence("ATCGX")
@pytest.mark.slow
def test_large_sequence(self):
"""Test validation of large sequences."""
large_seq = "ATCG" * 1000
assert validate_dna_sequence(large_seq) == TrueUse appropriate markers for different test types:
@pytest.mark.slow: Long-running tests@pytest.mark.integration: Integration tests@pytest.mark.unit: Unit tests@pytest.mark.pdf: Tests that generate PDF files
# Run all tests
pytest
# Run specific test categories
pytest -m "not slow" # Skip slow tests
pytest -m "unit" # Only unit tests
pytest -m "integration" # Only integration tests
# Run with coverage
pytest --cov=dnallm --cov-report=html
# Run specific test files
pytest tests/inference/test_inference.pyDocumentation is organized in the docs/ directory:
docs/
├── index.md # Main documentation page
├── getting_started/ # Installation and setup guides
├── tutorials/ # Step-by-step tutorials
├── api/ # API reference
├── concepts/ # Core concepts
└── faq/ # Frequently asked questions
- Use Markdown for all documentation
- Include code examples that can be run
- Keep documentation up-to-date with code changes
- Use clear, concise language
- Include diagrams for complex concepts
# Install documentation dependencies
uv pip install -e '.[docs]'
# Build documentation
mkdocs build
# Serve documentation locally
mkdocs serve-
Run the complete pre-commit checklist (see Pre-commit Checklist above):
# Quick validation - all checks must pass ruff format --check . && ruff check . --statistics && flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 && mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped
-
Ensure all tests pass:
pytest tests/ -v --cov=dnallm --cov-report=term-missing
-
Update documentation if needed
-
Add tests for new functionality
-
Update CHANGELOG.md if applicable
-
Verify CI compatibility: Your local checks should match what CI runs
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Test addition/improvement
## Testing
- [ ] All existing tests pass
- [ ] New tests added for new functionality
- [ ] Manual testing completed
## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No breaking changes (or clearly documented)
## Related Issues
Closes #issue_number- Automated checks must pass (CI/CD)
- Code review by maintainers
- Testing on different environments
- Documentation review
When reporting bugs, please include:
-
Clear description of the issue
-
Steps to reproduce
-
Expected vs actual behavior
-
Environment information:
- Python version
- Operating system
- DNALLM version
- Dependencies versions
-
Minimal code example (if applicable)
-
Error messages and stack traces
For feature requests, please include:
- Clear description of the feature
- Use case and motivation
- Proposed implementation (if you have ideas)
- Alternatives considered
Use the provided issue templates when creating issues on GitHub.
We follow Semantic Versioning:
- MAJOR: Incompatible API changes
- MINOR: New functionality (backward compatible)
- PATCH: Bug fixes (backward compatible)
- Update version in
pyproject.toml - Update CHANGELOG.md
- Run full test suite
- Update documentation
- Create release tag
- Build and publish to PyPI
- Use profiling tools to identify bottlenecks
- Consider vectorization for numerical operations
- Implement caching for expensive computations
- Use async/await for I/O operations
- Monitor memory usage during development
- Clean up resources properly
- Use context managers for file operations
- Consider lazy loading for large datasets
- Use logging instead of print statements
- Add debugging information to error messages
- Use breakpoints in development
- Test edge cases thoroughly
# Setup development environment
uv venv
source .venv/bin/activate
uv pip install -e '.[dev,test]'
# Pre-commit validation (run before every commit)
# Option 1: Use automated script (recommended, code quality only)
python scripts/check_code.py
# Option 2: Include test suite execution
python scripts/check_code.py --with-tests
# Option 3: Manual validation
ruff format --check . && ruff check . --statistics && flake8 dnallm/mcp/ --max-line-length=79 --extend-ignore=E203,W503,C901,E402 && mypy dnallm/ --ignore-missing-imports --no-strict-optional --disable-error-code=var-annotated --disable-error-code=assignment --disable-error-code=return-value --disable-error-code=arg-type --disable-error-code=index --disable-error-code=attr-defined --disable-error-code=operator --disable-error-code=call-overload --disable-error-code=valid-type --disable-error-code=no-redef --disable-error-code=dict-item --disable-error-code=return --disable-error-code=unreachable --disable-error-code=misc --disable-error-code=import-untyped
# Auto-fix code issues
python scripts/check_code.py --fix
# Format code manually
ruff format .
# Run tests
pytest tests/ -v --cov=dnallm --cov-report=term-missing
# Build documentation
mkdocs build
mkdocs serve-
Code check script fails:
- Make sure you're in the DNALLM root directory
- Ensure virtual environment is activated:
source .venv/bin/activate - Install dependencies:
uv pip install -e '.[dev,test]'
-
Ruff formatting errors:
- Auto-fix:
python scripts/check_code.py --fix - Manual fix:
ruff format .
- Auto-fix:
-
Import errors:
- Check import order and use
# noqa: E402for necessary late imports - Run
ruff check . --fixto auto-fix some import issues
- Check import order and use
-
Type checking errors:
- Most are disabled in development, but fix critical ones
- Check specific files:
mypy dnallm/specific_file.py
-
Test failures:
- Run specific test files:
pytest tests/specific_test.py -v - Run with verbose output:
python scripts/check_code.py --verbose
- Run specific test files:
-
Script permission errors (Linux/macOS):
- Make executable:
chmod +x scripts/check_code.sh - Or use Python script:
python scripts/check_code.py
- Make executable:
- GitHub Issues: For bug reports and feature requests
- GitHub Discussions: For questions and general discussion
- Documentation: Check the docs/ directory first
- Examples: Look at the example/ directory for usage patterns
Contributors will be recognized in:
- CONTRIBUTORS.md file
- Release notes
- Project documentation
Thank you for contributing to DNALLM! 🧬