Backfill Workflow

This page describes how to perform document backfill using the Workflow CLI. The backfill workflow migrates documents from your source cluster to your target cluster using snapshot-based reindexing (RFS).

The backfill mental model

A backfill migration follows this sequence:

Snapshot — Create a point-in-time snapshot of source indexes
Register — Make the snapshot accessible to the migration tooling
Metadata — Transfer index mappings, settings, and templates to the target
RFS Load — Reindex documents from the snapshot to the target cluster
Cleanup — Remove temporary coordination state

Each phase completes before the next begins. Approval gates between phases let you verify progress before continuing.

Configuration categories

Run workflow configure sample to see all available options for your version. The configuration covers these categories:

Source and target clusters

Define endpoints, versions, and authentication for each cluster.

Snapshot repositories

Configure where snapshots are stored (S3 bucket, path, region, IAM role for AWS managed sources).

Index allowlists

Control which indexes are included at each phase:

Snapshot creation allowlist
Metadata migration allowlist
Document backfill allowlist

Resource allocation

Tune parallelism and resource limits for the migration pods.

Index allowlist syntax

Allowlist entries are matched as exact literal strings by default. Use the regex: prefix for pattern matching:

Entry	Matches
`my-index`	Only "my-index" (exact match)
`*`	Only an index literally named "*" (not a wildcard)
`regex:.*`	All indexes (regex wildcard)
`regex:logs-.*`	"logs-app", "logs-web", etc.
`regex:logs-.*-2024`	"logs-app-2024", "logs-web-2024", etc.

Common mistake: Using * expecting it to match all indexes. Use regex:.* instead.

Using existing snapshots

If you already have a snapshot, reference it instead of creating a new one by setting externallyManagedSnapshot in your snapshot configuration. See workflow configure sample for the exact field path.

Verification

After the workflow completes, verify the migration:

Check document counts

Use the console's authenticated curl wrapper:

# Source cluster
console clusters curl source -- "/_cat/indices?v"

# Target cluster
console clusters curl target -- "/_cat/indices?v"

Compare specific indexes

console clusters curl target -- "/<index>/_count"
console clusters curl target -- "/<index>/_settings"

Test queries

Run representative queries against the target to verify data integrity.

Error recovery

When a workflow fails

Check status to identify the failed step:
```
workflow status
```
View logs for the failed step:
```
workflow output
```
Fix the underlying issue (configuration, permissions, cluster health, etc.)
Resubmit:
```
workflow submit
```

RFS checkpoints

RFS tracks progress at the shard level. If a backfill fails partway through:

Completed shards are recorded in the coordination index
Resubmitting resumes from the last checkpoint
Already-migrated documents are not re-processed

This means you don't lose progress on large migrations if a failure occurs.

Common failure causes

Symptom	Likely cause	Resolution
Snapshot creation fails	S3 permissions, missing IAM role	Check `s3RoleArn` for AWS managed sources
Metadata migration fails	Version incompatibility	Review Migration Paths for supported versions
RFS stalls	Target cluster overloaded	Reduce parallelism, check cluster health
Authentication errors	Invalid credentials	Verify Kubernetes secrets exist and contain correct values

Parallelism and resource tuning

Parallelism

RFS runs multiple workers in parallel, each reading shard data directly from the snapshot in S3. Because workers read from object storage — not the source cluster — scaling up workers has zero impact on the source cluster. The only constraint on parallelism is the target cluster's indexing capacity and available Kubernetes resources.

Consider:

Target cluster indexing capacity (the usual bottleneck)
Available Kubernetes node resources (CPU, memory for workers)
S3 read throughput (rarely a bottleneck)

Start with defaults and increase if the target cluster has headroom.

Resource limits

Workflow pods use default resource requests and limits. For large migrations, you may need to adjust:

CPU and memory for RFS workers
Storage for temporary data
Pod count limits in Argo Workflows

Migration duration factors

Factor	Impact
Total data volume	Primary factor
Number of shards	Determines maximum parallelism (1 worker per shard)
Document size	Larger documents = slower indexing
Target cluster capacity	Indexing throughput is usually the bottleneck
S3 read throughput	Rarely a bottleneck; scales with worker count
Network bandwidth	Data transfer speed

Monitoring during backfill

# Interactive TUI — view progress, approve steps, tail logs
workflow manage

# Check workflow status
workflow status

# Stream logs (use tab-completion on -l to discover available label filters)
workflow output --follow

Next steps

Workflow CLI Overview - Concepts and full command reference
Migration Paths - Supported versions and compatibility
Troubleshooting - Common issues and solutions

Encountering a compatibility issue or missing feature?

Search existing issues to see if it’s already reported. If it is, feel free to upvote and comment.
Can’t find it? Create a new issue to let us know.

Backfill Workflow

Backfill Workflow

The backfill mental model

Configuration categories

Source and target clusters

Snapshot repositories

Index allowlists

Resource allocation

Index allowlist syntax

Using existing snapshots

Verification

Check document counts

Compare specific indexes

Test queries

Error recovery

When a workflow fails

RFS checkpoints

Common failure causes

Parallelism and resource tuning

Parallelism

Resource limits

Migration duration factors

Monitoring during backfill

Next steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Migration Assistant - EKS

Deployment

Running Migrations

Solr Migrations

Reference

AI Assisted Migrations

Clone this wiki locally