Skip to content

grove-platform/github-copier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

214 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Docs Code Example Copier

A GitHub app that automatically copies code examples and files from source repositories to target repositories when pull requests are merged. Features centralized configuration with distributed workflow management, $ref support for reusable components, advanced pattern matching, and comprehensive monitoring.

Features

Core Functionality

  • Main Config System - Centralized configuration with distributed workflow management
  • Source Context Inference - Workflows automatically inherit source repo/branch
  • $ref Support - Reusable components for transformations, strategies, and excludes
  • Resilient Loading - Continues processing when individual configs fail (logs warnings)
  • Automated File Copying - Copies files from source to target repos on PR merge
  • Advanced Pattern Matching - Prefix, glob, and regex patterns with variable extraction
  • Path Transformations - Template-based path transformations with variable substitution
  • Flexible Commit Strategies - Direct commits or pull requests with auto-merge
  • Deprecation Tracking - Automatic tracking of deleted files

Enhanced Features

  • Workflow References - Local, remote (repo), or inline workflow configs
  • Default Precedence - Workflow > Workflow config > Main config > System defaults
  • Message Templating - Template-ized commit messages and PR titles
  • PR Template Integration - Fetch and merge PR templates from target repos
  • File Exclusion - Exclude patterns to filter out unwanted files
  • Audit Logging - MongoDB-based event tracking for all operations
  • Health & Metrics - /health, /ready, and /metrics endpoints for monitoring
  • Rate Limit Handling - Automatic GitHub API rate limit detection, backoff, and retry
  • Webhook Idempotency - Deduplication via X-GitHub-Delivery header tracking
  • Structured Logging - JSON structured logging via log/slog (Cloud Logging compatible)
  • Development Tools - Dry-run mode, CLI validation, enhanced logging
  • Thread-Safe - Concurrent webhook processing with proper state management

Operator UI

  • Web dashboard at /operator/ - Five-tab UI (Overview, Webhooks, Audit, Workflows, System) with dark mode, keyboard shortcuts, and shareable URLs
  • GitHub PAT authentication - Users sign in with their personal access token; role is derived from their permission on a configured auth repo (admin/maintain → operator, write/triage/read → writer)
  • Per-repo replay authorization - Replay requires the caller's PAT to have read access to the source repo of the webhook being replayed
  • Writer-facing tools - Workflow browser, PR lookup, recent copies feed, file match tester, audit drawer, per-delivery log viewer
  • AI rule suggester - Paste a source/target pair; get a generated copier rule self-verified against the in-process pattern matcher. Two providers: Anthropic (hosted, default in prod via the Grove Foundry APIM gateway) or Ollama (local, for dev)

🚀 Quick Start

Prerequisites

  • Go 1.26+
  • GitHub App credentials
  • Google Cloud project (for Secret Manager and logging)
  • MongoDB Atlas (optional, for audit logging)

Installation

# Clone the repository
git clone https://github.com/your-org/code-example-tooling.git
cd code-example-tooling/github-copier

# Install dependencies
go mod download

# Build the application
go build -o github-copier .

# Build CLI tools
go build -o config-validator ./cmd/config-validator

Local Configuration

  1. Copy environment example file
cp env.yaml.example env.yaml
  1. Set required environment variables
# GitHub Configuration
GITHUB_APP_ID: "123456"
INSTALLATION_ID: "789012"  # Optional fallback

# Config Repository (where main config lives)
CONFIG_REPO_OWNER: "your-org"
CONFIG_REPO_NAME: "config-repo"
CONFIG_REPO_BRANCH: "main"

# Main Config
MAIN_CONFIG_FILE: ".copier/workflows/main.yaml"
USE_MAIN_CONFIG: "true"

# Secret Manager References
GITHUB_APP_PRIVATE_KEY_SECRET_NAME: "projects/.../secrets/PEM/versions/latest"
WEBHOOK_SECRET_NAME: "projects/.../secrets/webhook-secret/versions/latest"

# Application Settings
WEBSERVER_PATH: "/events"
DEPRECATION_FILE: "deprecated_examples.json"
COMMITTER_NAME: "GitHub Copier App"
COMMITTER_EMAIL: "bot@mongodb.com"

# Feature Flags
AUDIT_ENABLED: "false"
METRICS_ENABLED: "true"
  1. Create main configuration file

Create .copier/workflows/main.yaml in your config repository:

# Main config with global defaults and workflow references
defaults:
  commit_strategy:
    type: "pull_request"
    auto_merge: false
  exclude:
    - "**/.env"
    - "**/node_modules/**"

workflow_configs:
  # Reference workflows in source repo
  - source: "repo"
    repo: "your-org/source-repo"
    branch: "main"
    path: ".copier/workflows/config.yaml"
    enabled: true
  1. Create workflow config in source repository

Create .copier/workflows/config.yaml in your source repository:

workflows:
  - name: "copy-examples"
    # source.repo and source.branch inherited from workflow config reference
    destination:
      repo: "your-org/target-repo"
      branch: "main"
    transformations:
      - move: { from: "examples", to: "docs/examples" }
    commit_strategy:
      type: "pull_request"
      pr_title: "Update code examples"
      use_pr_template: true

Running the Application

# Run with default settings
./github-copier

# Run with custom environment file
./github-copier -env ./configs/.env.production

# Run in dry-run mode (no actual commits)
./github-copier -dry-run

# Validate configuration only
./github-copier -validate

Configuration

See MAIN-CONFIG-README.md for complete configuration documentation.

Main Config Structure

The application uses a three-tier configuration system:

  1. Main Config - Centralized defaults and workflow references
  2. Workflow Configs - Collections of workflows (local, remote, or inline)
  3. Individual Workflows - Specific source → destination mappings

Transformation Types

Move Transformation

Move files from one directory to another:

transformations:
  - move:
      from: "examples/go"
      to: "code/go"

Moves: examples/go/main.gocode/go/main.go

Copy Transformation

Copy a single file to a new location:

transformations:
  - copy:
      from: "README.md"
      to: "docs/README.md"

Copies: README.mddocs/README.md

Glob Transformation

Wildcard matching with path transformation:

transformations:
  - glob:
      pattern: "examples/*/main.go"
      transform: "code/${relative_path}"

Matches: examples/go/main.gocode/examples/go/main.go

Regex Transformation

Full regex with named capture groups:

transformations:
  - regex:
      pattern: "^examples/(?P<lang>[^/]+)/(?P<file>.+)$"
      transform: "code/${lang}/${file}"

Matches: examples/go/main.gocode/go/main.go (extracts lang=go, file=main.go)

Path Transformations

Transform source paths to target paths using variables:

path_transform: "docs/${lang}/${category}/${file}"

Built-in Variables:

  • ${path} - Full source path
  • ${filename} - File name only
  • ${dir} - Directory path
  • ${ext} - File extension

Custom Variables:

  • Any named groups from regex patterns
  • Example: (?P<lang>[^/]+) creates ${lang}

Commit Strategies

Direct Commit

commit_strategy:
  type: "direct"
  commit_message: "Update examples from ${source_repo}"

Pull Request

commit_strategy:
  type: "pull_request"
  commit_message: "Update examples"
  pr_title: "Update ${category} examples"
  pr_body: "Automated update from ${source_repo}"
  use_pr_template: true  # Fetch and merge PR template from target repo
  auto_merge: true

Advanced Features

$ref Support for Reusable Components

Extract common configurations into separate files:

# Workflow config
workflows:
  - name: "mflix-java"
    destination:
      repo: "mongodb/sample-app-java-mflix"
      branch: "main"
    transformations:
      $ref: "../transformations/mflix-java.yaml"
    commit_strategy:
      $ref: "../strategies/mflix-pr-strategy.yaml"
    exclude:
      $ref: "../common/mflix-excludes.yaml"

Source Context Inference

Workflows automatically inherit source repo/branch from workflow config reference:

# No need to specify source.repo and source.branch!
workflows:
  - name: "my-workflow"
    # source.repo and source.branch inherited automatically
    destination:
      repo: "mongodb/dest-repo"
      branch: "main"
    transformations:
      - move: { from: "src", to: "dest" }

PR Template Integration

Automatically fetch and merge PR templates from target repositories:

commit_strategy:
  type: "pull_request"
  pr_body: "🤖 Automated update"
  use_pr_template: true  # Fetches .github/pull_request_template.md

File Exclusion

Exclude unwanted files at the workflow or workflow config level:

exclude:
  - "**/.gitignore"
  - "**/node_modules/**"
  - "**/.env"
  - "**/dist/**"

Message Templates

Use variables in commit messages and PR titles:

commit_message: "Update ${category} examples from ${lang}"
pr_title: "Update ${category} examples"

Available Variables:

  • ${rule_name} - Name of the copy rule
  • ${source_repo} - Source repository
  • ${target_repo} - Target repository
  • ${source_branch} - Source branch
  • ${target_branch} - Target branch
  • ${file_count} - Number of files being copied
  • Any custom variables from pattern matching

CLI Tools

Config Validator

Validate and test configurations before deployment:

# Validate config file
./config-validator validate -config copier-config.yaml -v

# Test pattern matching
./config-validator test-pattern \
  -type regex \
  -pattern "^examples/(?P<lang>[^/]+)/(?P<file>.+)$" \
  -file "examples/go/main.go"

# Test path transformation
./config-validator test-transform \
  -template "docs/${lang}/${file}" \
  -file "examples/go/main.go" \
  -pattern "^examples/(?P<lang>[^/]+)/(?P<file>.+)$"

# Initialize new config from template
./config-validator init -output copier-config.yaml

# Convert between formats
./config-validator convert -input config.json -output copier-config.yaml

Monitoring

Health Endpoint (Liveness)

Basic liveness check:

curl http://localhost:8080/health

Readiness Endpoint

Deep readiness probe (checks GitHub auth, rate limits, MongoDB):

curl http://localhost:8080/ready

Metrics Endpoint

Get performance metrics:

curl http://localhost:8080/metrics

Operator UI

The operator UI is a web dashboard served from /operator/ for diagnosing webhook processing, replaying failed deliveries, browsing workflows, and generating copier rules with AI assistance.

Enabling the UI

Set the required env vars:

OPERATOR_UI_ENABLED: "true"
OPERATOR_AUTH_REPO: "your-org/some-repo"  # user permissions here determine role
OPERATOR_REPO_SLUG: "your-org/some-repo"  # optional; enables audit-row deep links

Startup fails if OPERATOR_UI_ENABLED=true without OPERATOR_AUTH_REPO — this prevents an accidentally-open operator UI.

Authentication and roles

Each user authenticates with their own GitHub Personal Access Token. Paste the PAT into the sign-in prompt; the server checks the user's permission on OPERATOR_AUTH_REPO and assigns a role:

GitHub permission Operator UI role Can do
admin / maintain operator View everything; replay deliveries; cut release tags; change AI settings
write / triage / read writer View workflows, audit, recent copies, file match tester, AI rule suggester
None denied 401 Unauthorized

write maps to writer (not operator) so typical docs contributors with repo write access can't replay deliveries or cut releases — those need an explicit admin / maintain grant.

On top of the role, replay is repo-scoped: the user's PAT must also have read access to the source repo of the webhook being replayed.

AI rule suggester

The operator UI includes an LLM-backed helper that takes a source/target file pair and returns a generated copier workflow rule, self-verified against the in-process pattern matcher before display.

Two providers are supported via LLM_PROVIDER:

  • anthropic (default in Cloud Run): calls the Anthropic Messages API. For MongoDB deployments this routes through the Grove Foundry APIM gateway — set LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic and load the gateway key from Secret Manager via ANTHROPIC_API_KEY_SECRET_NAME.
  • ollama (default for local dev): runs against a local Ollama instance at http://localhost:11434. Connect, pull models, and switch the active model from the UI's System → AI settings panel without a redeploy.

Smoke-test the LLM provider end-to-end with cmd/test-llm.

Audit Logging

When enabled, all operations are logged to MongoDB:

// Query recent copy events
db.audit_events.find({
  event_type: "copy",
  success: true
}).sort({timestamp: -1}).limit(10)

// Find failed operations
db.audit_events.find({
  success: false
}).sort({timestamp: -1})

// Statistics by rule
db.audit_events.aggregate([
  {$match: {event_type: "copy"}},
  {$group: {
    _id: "$rule_name",
    count: {$sum: 1},
    avg_duration: {$avg: "$duration_ms"}
  }}
])

Testing

Run Tests

# Run all tests with race detector
go test -race ./...

# Run specific test suite
go test ./services -v -run TestPatternMatcher

# Run with coverage
go test -race ./services -cover
go test -race ./services -coverprofile=coverage.out
go tool cover -html=coverage.out

Development

Dry-Run Mode

Test without making actual changes:

./github-copier -env .env.test -dry-run

In dry-run mode:

  • Webhooks are received and processed through the full pipeline
  • Files are matched and path transformations are applied
  • GitHub auth failures are tolerated (logged as warnings)
  • No commits, PRs, or file uploads are created

Structured Logging

The app uses log/slog with JSON output. Enable debug logging:

LOG_LEVEL=debug ./github-copier
# or
COPIER_DEBUG=true ./github-copier

Architecture

Project Structure

github-copier/
├── app.go                    # Main application entry point
├── github-app-manifest.yml   # GitHub App permissions documentation
├── cmd/
│   ├── config-validator/     # CLI validation tool
│   ├── test-pem/             # PEM key validation tool
│   └── test-webhook/         # Webhook testing tool
├── configs/
│   ├── environment.go        # Environment configuration
│   ├── .env.local.example    # Local environment template
│   └── copier-config.example.yaml # Config template
├── scripts/
│   ├── release.sh            # Create versioned releases
│   ├── deploy-cloudrun.sh    # Cloud Run deployment
│   ├── ci-local.sh           # Run CI checks locally
│   ├── integration-test.sh   # End-to-end integration tests
│   └── ...                   # Additional helper scripts
├── services/
│   ├── webhook_handler_new.go # Webhook handler (orchestrator)
│   ├── workflow_processor.go  # ProcessWorkflow() - core logic
│   ├── pattern_matcher.go     # Pattern matching engine
│   ├── config_loader.go       # Config loading & validation
│   ├── main_config_loader.go  # Main config with $ref support
│   ├── github_auth.go         # GitHub App authentication
│   ├── github_read.go         # GitHub read operations (REST + GraphQL)
│   ├── github_write_to_target.go # GitHub write operations
│   ├── github_write_to_source.go # Deprecation file updates
│   ├── token_manager.go       # Thread-safe token state management
│   ├── rate_limit.go          # GitHub API rate limit handling
│   ├── delivery_tracker.go    # Webhook idempotency (deduplication)
│   ├── errors.go              # Sentinel errors & classification
│   ├── logger.go              # Structured logging (slog)
│   ├── service_container.go   # Dependency injection container
│   ├── file_state_service.go  # Thread-safe upload/deprecation queues
│   ├── health_metrics.go      # Health, readiness, metrics & config endpoints
│   ├── slack_notifier.go      # Slack notifications
│   └── pr_template_fetcher.go # PR template resolution
├── types/
│   ├── config.go              # Configuration types
│   └── types.go               # Core types
└── docs/
    ├── DEPLOYMENT.md          # Deployment & rollback guide
    ├── CONFIG-REFERENCE.md    # Environment variables & YAML schema
    ├── WEBHOOK-TESTING.md     # Webhook testing guide
    ├── SLACK-NOTIFICATIONS.md # Slack integration guide
    └── ...                    # Additional documentation

Service Container

The application uses dependency injection for clean architecture:

container := NewServiceContainer(config)
// All services initialized and wired together

Releasing

The project uses semantic versioning (vMAJOR.MINOR.PATCH) with GitHub Releases. Pushing a version tag triggers CI to build, test, and deploy to Cloud Run.

Release Workflow

  1. Merge your changes to main.
  2. Run the release script:
# Preview what will happen (no changes made)
./scripts/release.sh v1.2.0 --dry-run

# Create the release
./scripts/release.sh v1.2.0

The script:

  1. Validates the version format and that the working tree is clean on main
  2. Renames the [Unreleased] section in CHANGELOG.md to [v1.2.0] - YYYY-MM-DD
  3. Commits the changelog update and creates an annotated git tag
  4. Pushes the tag to origin — this triggers the CI deploy job
  5. Creates a GitHub Release with the changelog excerpt

CI Deploy Pipeline

The deploy job in .github/workflows/ci.yml runs only on version tag pushes:

  • Authenticates to Google Cloud via Workload Identity Federation
  • Deploys to Cloud Run with the version stamped as a build arg (VERSION)
  • Tags the Cloud Run revision with the version for easy rollback

Version Stamping

The version tag is injected at build time via -ldflags:

go build -ldflags "-X main.Version=v1.2.0" -o github-copier .

The version appears in the startup banner and the /health endpoint response.

Deployment

See DEPLOYMENT.md for the complete deployment and rollback guide.

Security

  • Webhook Signature Verification - HMAC-SHA256 validation
  • Webhook Idempotency - Duplicate delivery detection via X-GitHub-Delivery
  • Secret Management - Google Cloud Secret Manager
  • Least Privilege - Minimal GitHub App permissions (see github-app-manifest.yml)

Documentation

Getting Started

Reference

Features

Tools

  • Config Validator - CLI tool for validating configs
  • Test Webhook - CLI tool for testing webhooks
  • Test PEM - CLI tool for verifying the GitHub App private key
  • Test LLM - CLI tool for smoke-testing the AI rule suggester's LLM provider
  • Scripts - Helper scripts for deployment, testing, and releases

About

Tooling to copy source files from docs-maintained monorepos to discrete artifact repositories.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors