Common questions about the github-copier application.
The GitHub copier is a GitHub app that automatically copies code examples and files from a source repository to one or more target repositories when pull requests are merged. It features advanced pattern matching, path transformations, and audit logging.
- Automate file synchronization between repositories
- Maintain consistency across multiple documentation repos
- Track changes with audit logging
- Flexible routing with pattern matching
- Transform paths during copying
- Advanced pattern matching (prefix, glob, regex)
- Path transformations with variable substitution
- Multiple target repositories
- Flexible commit strategies (direct or PR)
- Target Repo Batching - All workflows targeting the same repo are combined into one commit/PR
- PR Template Integration - Fetch and merge PR templates from target repos
- File Exclusion - Exclude patterns to filter out unwanted files
- Deprecation tracking
- MongoDB audit logging
- Health and metrics endpoints
- Slack notifications
- Dry-run mode for testing
- Operator UI — Web dashboard at
/operator/for replay, audit browsing, workflow inspection, and AI-assisted rule generation
A web dashboard served from /operator/ when OPERATOR_UI_ENABLED=true. Five tabs:
- Overview — live metrics, recent activity, health of dependent services
- Webhooks — recent webhook traces with filter/search and one-click replay
- Audit — searchable audit event history with a per-event drawer (trace + logs + replay)
- Workflows — browse the loaded copier config; test path matches with the built-in file match tester
- System — deployment metadata, AI settings, release tagging
Anyone with a GitHub PAT that has access to OPERATOR_AUTH_REPO. The user's permission on that repo determines their UI role:
admin/maintain→ operator: full access including replay, release, AI settingswrite/triage/read→ writer: view audit, workflows, recent copies, run the AI rule suggester and file match tester, but no replay / release- No access → 401 Unauthorized
write is deliberately mapped to writer (not operator) so typical docs contributors can't replay deliveries or cut releases just by having repo write access. Operator capability requires an explicit admin / maintain grant.
Paste a source file path and the target file path you want; optionally name the target repo. The server sends the pair plus a structured prompt to the configured LLM, parses the returned JSON, and runs the generated rule through the in-process pattern matcher to verify it actually produces your target from your source. If it doesn't match, the UI shows a "not verified" warning next to the YAML so you can review before copying it into your config.
Two providers are supported:
- Anthropic (default in Cloud Run) — calls the hosted Messages API. In this repo's deploy it routes through the Grove Foundry APIM gateway so no infrastructure needs to be stood up.
- Ollama (local dev) — runs against a local model server. The UI can pull models, switch the active one, and delete models without a redeploy.
To cap cost, the suggester is rate-limited to 30 requests/hour per authenticated user.
Check the banner at startup — it prints the active AI Provider, AI Model, and AI URL. Then:
- Anthropic: make sure
ANTHROPIC_API_KEY(local) orANTHROPIC_API_KEY_SECRET_NAME(Cloud Run) is set. In Cloud Run, the runtime service account also needsroles/secretmanager.secretAccessoron the secret. - Ollama: confirm
ollama serveis running on the host atLLM_BASE_URL(defaulthttp://localhost:11434) and that you've pulled a model. - Use
cmd/test-llmto exercise the full path outside the UI — it reports Ping, ListModels, and a real GenerateJSON call.
Yes! You can define multiple workflows, each with its own transformations:
workflows:
- name: "Go examples"
source:
repo: "owner/source"
branch: "main"
destination:
repo: "owner/target"
branch: "main"
transformations:
- regex:
pattern: "^examples/go/(?P<file>.+)$"
transform: "code/go/${file}"
- name: "Python examples"
source:
repo: "owner/source"
branch: "main"
destination:
repo: "owner/target"
branch: "main"
transformations:
- regex:
pattern: "^examples/python/(?P<file>.+)$"
transform: "code/python/${file}"Yes. A file can match multiple workflows and be copied to multiple targets. This is useful for copying the same file to different repositories or branches.
Files from all workflows that share the same destination repo are batched into a single commit or PR. The app does not create separate commits/PRs per workflow.
Key behaviors:
- All matched files are combined into one commit tree
- The last workflow's commit strategy wins (PR title, body, commit message, auto-merge setting)
- If one workflow uses
directand another usespull_requestfor the same target, the last strategy wins — they are not separated
Example: If workflow A (direct commit) and workflow B (PR with auto-merge) both target org/docs, the result is a single operation using workflow B's strategy — because it runs last.
To get separate PRs per workflow, use different destination repos or branches:
# These create separate PRs because the branches differ:
- name: "go-examples"
destination: { repo: "org/docs", branch: "copier/go" }
- name: "python-examples"
destination: { repo: "org/docs", branch: "copier/python" }See Architecture > Target Repo Batching for details.
Main config: Store in a central config repository and set MAIN_CONFIG_FILE in env.yaml.
Workflow configs: Store in .copier/workflows/config.yaml in source repositories, or reference them from the main config.
For local testing: Store config files in the github-copier directory and set appropriate environment variables.
- Prefix - Simple directory matching (e.g.,
examples/) - Glob - Wildcard matching (e.g.,
**/*.go) - Regex - Complex patterns with variable extraction (e.g.,
^examples/(?P<lang>[^/]+)/.*$)
See Pattern Matching Guide for details.
Use regex patterns with named capture groups:
source_pattern:
type: "regex"
pattern: "^examples/(?P<lang>[^/]+)/(?P<category>[^/]+)/(?P<file>.+)$"This extracts lang, category, and file variables that you can use in path transformations.
${path}- Full source file path${filename}- Just the filename${dir}- Directory path${ext}- File extension (with dot)${name}- Filename without extension
Use the config-validator tool:
./config-validator test-pattern \
-type regex \
-pattern "^examples/(?P<lang>[^/]+)/(?P<file>.+)$" \
-file "examples/go/main.go"Common issues:
- Pattern doesn't match actual file paths
- Missing
^or$anchors in regex - Wrong pattern type
- Typos in the pattern
Check actual file paths in logs and test your pattern with config-validator.
Use templates with variable substitution:
path_transform: "docs/${lang}/${category}/${file}"Variables come from pattern matching or built-in variables.
Yes, use ${path}:
path_transform: "${path}"Yes, use just the filename:
path_transform: "all-examples/${filename}"./config-validator test-transform \
-source "examples/go/main.go" \
-template "docs/${lang}/${file}" \
-vars "lang=go,file=main.go"- Go 1.26+
- GitHub App credentials
- Google Cloud project (for Secret Manager and logging)
- MongoDB Atlas (optional, for audit logging)
Yes! See Local Testing for instructions.
See Deployment Guide for the complete guide including the deployment checklist.
No, MongoDB is optional. It's used for audit logging. You can disable it:
export AUDIT_ENABLED=falseThe app uses Google Cloud Secret Manager for storing GitHub credentials. You could modify it to use environment variables instead, but this requires code changes.
-
Start the app in dry-run mode:
DRY_RUN=true CONFIG_FILE=copier-config.yaml make run-local-quick
-
Send a test webhook:
./test-webhook -payload testdata/example-pr-merged.json
See Local Testing for details.
Dry-run mode processes webhooks and matches files but doesn't make actual commits or create PRs. It's perfect for testing configuration changes.
export DRY_RUN=trueUse the test-webhook tool:
export GITHUB_TOKEN=ghp_your_token
./test-webhook -pr 123 -owner myorg -repo myrepo./config-validator validate -config copier-config.yaml -vUse the health and metrics endpoints:
# Health check
curl http://localhost:8080/health
# Metrics
curl http://localhost:8080/metricsSet the Slack webhook URL:
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."See Slack Notifications for details.
Query MongoDB:
use code_copier
db.audit_events.find().sort({timestamp: -1}).limit(10).pretty()- Check Troubleshooting Guide
- Enable debug logging:
export LOG_LEVEL=debug - Check application logs
- Use config-validator to test patterns
Yes, use the test-webhook tool:
./test-webhook -pr 123 -owner myorg -repo myrepo- Direct - Commit directly to target branch
- Pull Request - Create a PR in target repo (with optional auto-merge)
commit_strategy:
type: "pull_request"
pr_title: "Update examples"
auto_merge: falseYes! Each rule can have multiple targets:
targets:
- repo: "org/docs-repo"
branch: "main"
path_transform: "examples/${file}"
- repo: "org/website-repo"
branch: "main"
path_transform: "static/examples/${file}"When files are deleted in the source repo, they're tracked in a deprecation file in the target repo. This helps you identify files that should be removed.
Yes, use template variables:
commit_strategy:
type: "pull_request"
commit_message: "Update ${lang} examples from PR #${pr_number}"
pr_title: "Update ${lang} examples"
pr_body: "Automated update (${file_count} files)"${rule_name}- Name of the copy rule${source_repo}- Source repository${target_repo}- Target repository${source_branch}- Source branch${target_branch}- Target branch${file_count}- Number of files${pr_number}- PR number${commit_sha}- Commit SHA- Plus any variables extracted from pattern matching
Set use_pr_template: true in your commit strategy or batch config:
commit_strategy:
type: "pull_request"
pr_body: |
🤖 Automated update
Files: ${file_count}
use_pr_template: true # Fetches .github/pull_request_template.mdThe service will:
- Fetch the PR template from the target repo
- Place the template content first (checklists, guidelines)
- Add a separator (
---) - Append your configured content (automation info)
This ensures reviewers see the target repo's review guidelines prominently.
Use exclude_patterns in your source pattern:
source_pattern:
type: "prefix"
pattern: "examples/"
exclude_patterns:
- "\.gitignore$" # Exclude .gitignore
- "node_modules/" # Exclude dependencies
- "\.env$" # Exclude .env files
- "/dist/" # Exclude build output
- "\.test\.(js|ts)$" # Exclude test filesCommon use cases:
- Filter out configuration files (
.gitignore,.env) - Exclude dependencies (
node_modules/,vendor/) - Skip build artifacts (
/dist/,/build/) - Exclude test files (
*.test.js,*_test.go)
The app can handle hundreds of files per PR. Performance depends on:
- GitHub API rate limits
- Network latency
- Pattern complexity
- Number of targets
Yes, the app uses proper synchronization for concurrent webhook processing.
GitHub API rate limits apply:
- 5,000 requests/hour for authenticated requests
- Lower limits for unauthenticated requests
GitHub App private key is stored in Google Cloud Secret Manager.
Webhooks are authenticated using HMAC-SHA256 signature verification with a shared secret.
Yes, for local testing:
unset WEBHOOK_SECRETNever disable in production!
Minimum permissions:
- Contents: Read & Write (to read source files and write to target repos)
- Pull Requests: Read & Write (to create PRs)
- Webhooks: Read (to receive webhook events)
Check:
- Pattern matches the file paths
- Configuration is valid
- GitHub App has correct permissions
- Webhook is configured correctly
See Troubleshooting Guide for details.
Check:
- Webhook secret matches
- Signature verification is working
- For local testing, disable signature verification
Check:
- All required environment variables are set
- MongoDB connection (if enabled)
- Google Cloud credentials
- Application logs for errors