This is something we have discussed a couple of times, and recently it was brought up again (can't find the issue comment now), so I sketched a plan with Copilot to see how to implement this and what the missing parts are.
Bulk Storage Operations Plan
Goal
Enable safe, user-driven bulk migration of history data from temporary/expiring storage to durable storage, while reusing existing history bulk-operation architecture and minimizing new API surface.
Scope and Non-Goals
In scope
- Bulk operations for selected history datasets and dataset collections.
- Preview, execute, and status reporting for storage operations.
- Phased rollout: relocate first, then copy, then move.
Out of scope (phase 1)
- Physical file transfer between stores.
- Source cleanup and rollback tooling.
- New parallel bulk framework independent from existing history bulk operations.
Codebase Anchors
This plan is intentionally grounded in existing implementation:
- Existing history bulk endpoint and service:
- Existing bulk payload/operation schema:
- Existing single-dataset relocate constraints:
- Existing in-use/job concurrency guard:
- Existing frontend bulk selection flow:
- Existing task state endpoints:
Design Principles
- Reuse first: extend existing history bulk operation primitives.
- Snapshot first: preview and execute must operate on an immutable resolved item set.
- Per-item truth: every run reports dataset-level status and reason codes.
- Mode-specific semantics: relocate, copy, and move have different eligibility, quota, and integrity rules.
- Safe defaults: skip ineligible items with explicit errors; do not fail whole request by default.
Operation Modes
| Mode |
Data movement |
Eligibility baseline |
Quota effect |
Notes |
| relocate |
Metadata relabel only |
Must satisfy current relocate constraints (same-device, ownership/shareability checks) |
Quota relabel only if quota source label changes |
Fast, no byte transfer |
| copy |
Physical copy to target, source retained |
Target store must support copy pipeline |
Target quota increases by copied bytes |
Introduced in phase 2 |
| move |
Copy + cutover + source cleanup policy |
Same as copy + cleanup eligibility |
Target quota increases, source decreases after cleanup |
Introduced in phase 3 |
Unified API Strategy
Decision
Do not introduce a separate storage bulk framework. Extend history bulk operations with storage-specific operation types and params.
API shape
Use one family under history contents bulk APIs:
- Preview
- Endpoint:
POST /api/histories/{history_id}/contents/bulk/storage/preview
- Purpose: resolve selection, expand collections, compute eligibility and estimates.
- Execute
- Endpoint:
POST /api/histories/{history_id}/contents/bulk/storage/execute
- Purpose: start async run using immutable preview snapshot.
- Run status/detail
- Endpoint:
GET /api/histories/{history_id}/contents/bulk/storage/runs/{run_id}
- Purpose: rich per-item status beyond generic task state.
- Task compatibility
- Continue returning async task summary id where useful, but treat it as transport status only.
- Rich operation semantics live in the run model.
Rationale: this preserves existing selection/query behavior while avoiding duplicate endpoint ecosystems.
Request and Response Contracts
Preview request (conceptual)
- Selection input:
- explicit items, or
- query filters (same style as current bulk query selection).
- Operation params:
mode: relocate | copy | move
target_object_store_id
Preview response (minimum)
snapshot_id
selection_counts:
selected_items_count
expanded_leaf_count
unique_dataset_count
eligibility:
eligible_count
ineligible_count
- per-item entries with reason codes
estimates:
bytes_to_transfer (copy/move)
quota_delta_by_source
warnings (non-fatal)
expires_at
Execute request
snapshot_id
execution_policy:
skip_ineligible default true
max_retries optional
Execute response
run_id
task summary (optional passthrough)
- initial run summary counts
Snapshot Semantics (Critical)
Problem
Query-based selections can drift between preview and execute.
Required behavior
- Preview resolves concrete dataset ids and stores immutable snapshot.
- Execute accepts snapshot id only.
- On execute start, revalidate eligibility for each item and report any drift as per-item ineligible-at-execute reason.
- Snapshot expiration required to avoid stale execution.
Eligibility and Policy Matrix
Baseline checks for all modes
- User owns mutable history context for operation.
- Dataset/item access and permissions valid.
- Dataset not blocked by active job usage policy.
Relocate checks (phase 1)
- Same-device constraint as existing manager logic.
- Security check equivalent to existing can-change-object-store-id logic.
- Target object store selectable for current user.
Copy/move checks (phase 2+)
- Target store capability checks.
- Metadata/extra-files migration capability check.
- Quota preflight on target quota source.
Policy defaults
- Default skip-ineligible (per-item errors), not fail-whole-request.
- Optional strict mode can fail request if any item is ineligible.
Collection Expansion Rules
- Always expand collections recursively to leaf datasets for execution.
- Deduplicate by underlying dataset id before estimation and execution.
- Report both item-level and leaf-level counts to avoid user confusion.
Quota Semantics
Relocate
- No byte copy; model as quota-source relabel behavior only where applicable.
Copy
- Preview estimates target quota increase.
- Execute enforces preflight and per-item quota checks.
Move
- Same as copy during transfer.
- Apply source decrement only after successful cutover/cleanup state transition.
Metadata and Integrity Requirements (phase 2+)
Physical copy/move must include:
- Primary dataset file.
- Extra files directory contents.
- Metadata files and associated records.
- Required dataset storage pointers/references.
Verification policy:
- Configurable strictness.
- Default: size + existence checks.
- Optional strict mode: hash verification where available.
Run Model and Status Reporting
Need
Generic task state endpoint is insufficient for user-facing bulk migration progress.
Add persistent run model
Run-level fields:
run_id, history_id, snapshot_id, mode, target_object_store_id, created_by, timestamps.
- aggregate counts and bytes.
- terminal state.
Per-item fields:
dataset_id
state (pending, running, succeeded, failed, skipped)
reason_code and message
attempt_count
bytes_processed
last_updated
Failure Handling and Recovery
- Per-item transaction boundaries.
- Idempotent per-item execution key:
(run_id, dataset_id).
- Retry only failed transient errors up to policy limit.
- Preserve partial run state for resume.
- Cleanup states for move are explicit and auditable.
Frontend Plan
Entry point
Add Storage operation to existing history Selection dropdown flow.
Dialog flow
- User chooses mode + target store.
- User runs preview.
- UI renders:
- selection and leaf counts,
- ineligible reasons,
- estimate and quota impact,
- warnings.
- User confirms execute using preview snapshot.
- UI polls run endpoint for per-item progress.
UX requirements
- Explicit mode copy text and impact summaries.
- Show clear difference between blocked at preview and blocked at execute revalidation.
- Provide downloadable error report for large runs.
Phased Roadmap and Exit Criteria
Phase 1: Bulk relocate MVP
Deliver:
- Preview + execute + run status for relocate mode.
- Collection leaf expansion + dedupe.
- Existing relocate constraints mirrored in preview and execute.
- Snapshot-based execution.
Exit criteria:
- No query-selection drift in execute (snapshot only).
- Per-item reason codes for ineligible/failed items.
- Existing bulk behavior remains unchanged for non-storage operations.
Phase 2: Bulk copy
Deliver:
- Physical copy pipeline.
- Metadata + extra files handling.
- Integrity verification and quota preflight.
Exit criteria:
- Verified integrity for copied datasets according to policy.
- Accurate quota estimate and enforcement behavior.
- Resume/retry validated for partial failures.
Phase 3: Bulk move
Deliver:
- Move state machine: copy, verify, cutover, cleanup.
- Explicit cleanup policy and repair tooling.
Exit criteria:
- No silent data loss on interrupted move.
- Recoverable and auditable partial runs.
Testing Strategy
Unit
- Eligibility matrix by mode.
- Snapshot creation and expiration behavior.
- Collection expansion/dedup counts.
- Estimate correctness.
Integration
- Explicit selection and query-selection preview/execute parity.
- Job-state blocking based on active input/output associations.
- Relocate constraints parity with existing single-item relocate.
- Copy/move metadata and extra-files integrity.
- Quota preflight and execution failures.
Operational
- Large batch runs and UI polling load.
- Resume/retry reliability.
- Provider-specific object store behavior.
Open Decisions (with proposed defaults)
- Expose move initially?
- Default: no, relocate first, then copy, then move.
- Any ineligible item should fail whole request?
- Default: no, skip ineligible with per-item errors.
- Move cleanup timing?
- Default: delayed cleanup with explicit post-verify step.
- Integrity strictness?
- Default: best-effort baseline checks plus optional strict hash mode.
Final Recommendation
- Implement relocate with preview, immutable snapshots, and run-level status first.
- Keep one bulk architecture and avoid parallel APIs.
- Add copy and move only after integrity, quota, and recovery guarantees are proven by tests.
This is something we have discussed a couple of times, and recently it was brought up again (can't find the issue comment now), so I sketched a plan with Copilot to see how to implement this and what the missing parts are.
Bulk Storage Operations Plan
Goal
Enable safe, user-driven bulk migration of history data from temporary/expiring storage to durable storage, while reusing existing history bulk-operation architecture and minimizing new API surface.
Scope and Non-Goals
In scope
Out of scope (phase 1)
Codebase Anchors
This plan is intentionally grounded in existing implementation:
Design Principles
Operation Modes
Unified API Strategy
Decision
Do not introduce a separate storage bulk framework. Extend history bulk operations with storage-specific operation types and params.
API shape
Use one family under history contents bulk APIs:
POST /api/histories/{history_id}/contents/bulk/storage/previewPOST /api/histories/{history_id}/contents/bulk/storage/executeGET /api/histories/{history_id}/contents/bulk/storage/runs/{run_id}Rationale: this preserves existing selection/query behavior while avoiding duplicate endpoint ecosystems.
Request and Response Contracts
Preview request (conceptual)
mode: relocate | copy | movetarget_object_store_idPreview response (minimum)
snapshot_idselection_counts:selected_items_countexpanded_leaf_countunique_dataset_counteligibility:eligible_countineligible_countestimates:bytes_to_transfer(copy/move)quota_delta_by_sourcewarnings(non-fatal)expires_atExecute request
snapshot_idexecution_policy:skip_ineligibledefault truemax_retriesoptionalExecute response
run_idtasksummary (optional passthrough)Snapshot Semantics (Critical)
Problem
Query-based selections can drift between preview and execute.
Required behavior
Eligibility and Policy Matrix
Baseline checks for all modes
Relocate checks (phase 1)
Copy/move checks (phase 2+)
Policy defaults
Collection Expansion Rules
Quota Semantics
Relocate
Copy
Move
Metadata and Integrity Requirements (phase 2+)
Physical copy/move must include:
Verification policy:
Run Model and Status Reporting
Need
Generic task state endpoint is insufficient for user-facing bulk migration progress.
Add persistent run model
Run-level fields:
run_id,history_id,snapshot_id,mode,target_object_store_id,created_by, timestamps.Per-item fields:
dataset_idstate(pending, running, succeeded, failed, skipped)reason_codeand messageattempt_countbytes_processedlast_updatedFailure Handling and Recovery
(run_id, dataset_id).Frontend Plan
Entry point
Add Storage operation to existing history Selection dropdown flow.
Dialog flow
UX requirements
Phased Roadmap and Exit Criteria
Phase 1: Bulk relocate MVP
Deliver:
Exit criteria:
Phase 2: Bulk copy
Deliver:
Exit criteria:
Phase 3: Bulk move
Deliver:
Exit criteria:
Testing Strategy
Unit
Integration
Operational
Open Decisions (with proposed defaults)
Final Recommendation