-
Notifications
You must be signed in to change notification settings - Fork 58
Backfill Workflow
This page describes how to perform document backfill using the Workflow CLI. The backfill workflow migrates documents from your source cluster to your target cluster using snapshot-based reindexing (RFS).
A backfill migration follows this sequence:
- Snapshot — Create a point-in-time snapshot of source indexes
- Register — Make the snapshot accessible to the migration tooling
- Metadata — Transfer index mappings, settings, and templates to the target
- RFS Load — Reindex documents from the snapshot to the target cluster
- Cleanup — Remove temporary coordination state
Each phase completes before the next begins. Approval gates between phases let you verify progress before continuing.
Run workflow configure sample to see all available options for your version. The configuration covers these categories:
Define endpoints, versions, and authentication for each cluster.
Configure where snapshots are stored (S3 bucket, path, region, IAM role for AWS managed sources).
Control which indexes are included at each phase:
- Snapshot creation allowlist
- Metadata migration allowlist
- Document backfill allowlist
Tune parallelism and resource limits for the migration pods.
Allowlist entries are matched as exact literal strings by default. Use the regex: prefix for pattern matching:
| Entry | Matches |
|---|---|
my-index |
Only "my-index" (exact match) |
* |
Only an index literally named "*" (not a wildcard) |
regex:.* |
All indexes (regex wildcard) |
regex:logs-.* |
"logs-app", "logs-web", etc. |
regex:logs-.*-2024 |
"logs-app-2024", "logs-web-2024", etc. |
Common mistake: Using * expecting it to match all indexes. Use regex:.* instead.
If you already have a snapshot, reference it instead of creating a new one by setting externallyManagedSnapshot in your snapshot configuration. See workflow configure sample for the exact field path.
After the workflow completes, verify the migration:
Use the console's authenticated curl wrapper:
# Source cluster
console clusters curl source -- "/_cat/indices?v"
# Target cluster
console clusters curl target -- "/_cat/indices?v"console clusters curl target -- "/<index>/_count"
console clusters curl target -- "/<index>/_settings"Run representative queries against the target to verify data integrity.
-
Check status to identify the failed step:
workflow status
-
View logs for the failed step:
workflow output
-
Fix the underlying issue (configuration, permissions, cluster health, etc.)
-
Resubmit:
workflow submit
RFS tracks progress at the shard level. If a backfill fails partway through:
- Completed shards are recorded in the coordination index
- Resubmitting resumes from the last checkpoint
- Already-migrated documents are not re-processed
This means you don't lose progress on large migrations if a failure occurs.
| Symptom | Likely cause | Resolution |
|---|---|---|
| Snapshot creation fails | S3 permissions, missing IAM role | Check s3RoleArn for AWS managed sources |
| Metadata migration fails | Version incompatibility | Review Migration Paths for supported versions |
| RFS stalls | Target cluster overloaded | Reduce parallelism, check cluster health |
| Authentication errors | Invalid credentials | Verify Kubernetes secrets exist and contain correct values |
RFS runs multiple workers in parallel, each reading shard data directly from the snapshot in S3. Because workers read from object storage — not the source cluster — scaling up workers has zero impact on the source cluster. The only constraint on parallelism is the target cluster's indexing capacity and available Kubernetes resources.
Consider:
- Target cluster indexing capacity (the usual bottleneck)
- Available Kubernetes node resources (CPU, memory for workers)
- S3 read throughput (rarely a bottleneck)
Start with defaults and increase if the target cluster has headroom.
Workflow pods use default resource requests and limits. For large migrations, you may need to adjust:
- CPU and memory for RFS workers
- Storage for temporary data
- Pod count limits in Argo Workflows
| Factor | Impact |
|---|---|
| Total data volume | Primary factor |
| Number of shards | Determines maximum parallelism (1 worker per shard) |
| Document size | Larger documents = slower indexing |
| Target cluster capacity | Indexing throughput is usually the bottleneck |
| S3 read throughput | Rarely a bottleneck; scales with worker count |
| Network bandwidth | Data transfer speed |
# Interactive TUI — view progress, approve steps, tail logs
workflow manage
# Check workflow status
workflow status
# Stream logs (use tab-completion on -l to discover available label filters)
workflow output --follow- Workflow CLI Overview - Concepts and full command reference
- Migration Paths - Supported versions and compatibility
- Troubleshooting - Common issues and solutions
Encountering a compatibility issue or missing feature?
- Search existing issues to see if it’s already reported. If it is, feel free to upvote and comment.
- Can’t find it? Create a new issue to let us know.