Bulk storage operations (copy/move) support

This is something we have discussed a couple of times, and recently it was brought up again (can't find the issue comment now), so I sketched a plan with Copilot to see how to implement this and what the missing parts are.

---

# Bulk Storage Operations Plan

## Goal

Enable safe, user-driven bulk migration of history data from temporary/expiring storage to durable storage, while reusing existing history bulk-operation architecture and minimizing new API surface.

## Scope and Non-Goals

### In scope

- Bulk operations for selected history datasets and dataset collections.
- Preview, execute, and status reporting for storage operations.
- Phased rollout: relocate first, then copy, then move.

### Out of scope (phase 1)

- Physical file transfer between stores.
- Source cleanup and rollback tooling.
- New parallel bulk framework independent from existing history bulk operations.

## Codebase Anchors

This plan is intentionally grounded in existing implementation:

- Existing history bulk endpoint and service:
    - [lib/galaxy/webapps/galaxy/api/history_contents.py](lib/galaxy/webapps/galaxy/api/history_contents.py#L788)
    - [lib/galaxy/webapps/galaxy/services/history_contents.py](lib/galaxy/webapps/galaxy/services/history_contents.py#L707)
- Existing bulk payload/operation schema:
    - [lib/galaxy/schema/schema.py](lib/galaxy/schema/schema.py#L1305)
    - [lib/galaxy/schema/schema.py](lib/galaxy/schema/schema.py#L1343)
    - [lib/galaxy/schema/schema.py](lib/galaxy/schema/schema.py#L1354)
- Existing single-dataset relocate constraints:
    - [lib/galaxy/managers/datasets.py](lib/galaxy/managers/datasets.py#L133)
    - [lib/galaxy/model/security.py](lib/galaxy/model/security.py#L621)
- Existing in-use/job concurrency guard:
    - [lib/galaxy/model/**init**.py](lib/galaxy/model/__init__.py#L5373)
- Existing frontend bulk selection flow:
    - [client/src/components/History/CurrentHistory/HistoryOperations/SelectionOperations.vue](client/src/components/History/CurrentHistory/HistoryOperations/SelectionOperations.vue#L3)
    - [client/src/components/History/model/queries.ts](client/src/components/History/model/queries.ts#L59)
- Existing task state endpoints:
    - [lib/galaxy/webapps/galaxy/api/tasks.py](lib/galaxy/webapps/galaxy/api/tasks.py#L28)

## Design Principles

- Reuse first: extend existing history bulk operation primitives.
- Snapshot first: preview and execute must operate on an immutable resolved item set.
- Per-item truth: every run reports dataset-level status and reason codes.
- Mode-specific semantics: relocate, copy, and move have different eligibility, quota, and integrity rules.
- Safe defaults: skip ineligible items with explicit errors; do not fail whole request by default.

## Operation Modes

| Mode     | Data movement                            | Eligibility baseline                                                                   | Quota effect                                           | Notes                  |
| -------- | ---------------------------------------- | -------------------------------------------------------------------------------------- | ------------------------------------------------------ | ---------------------- |
| relocate | Metadata relabel only                    | Must satisfy current relocate constraints (same-device, ownership/shareability checks) | Quota relabel only if quota source label changes       | Fast, no byte transfer |
| copy     | Physical copy to target, source retained | Target store must support copy pipeline                                                | Target quota increases by copied bytes                 | Introduced in phase 2  |
| move     | Copy + cutover + source cleanup policy   | Same as copy + cleanup eligibility                                                     | Target quota increases, source decreases after cleanup | Introduced in phase 3  |

## Unified API Strategy

### Decision

Do not introduce a separate storage bulk framework. Extend history bulk operations with storage-specific operation types and params.

### API shape

Use one family under history contents bulk APIs:

1. Preview

- Endpoint: `POST /api/histories/{history_id}/contents/bulk/storage/preview`
- Purpose: resolve selection, expand collections, compute eligibility and estimates.

2. Execute

- Endpoint: `POST /api/histories/{history_id}/contents/bulk/storage/execute`
- Purpose: start async run using immutable preview snapshot.

3. Run status/detail

- Endpoint: `GET /api/histories/{history_id}/contents/bulk/storage/runs/{run_id}`
- Purpose: rich per-item status beyond generic task state.

4. Task compatibility

- Continue returning async task summary id where useful, but treat it as transport status only.
- Rich operation semantics live in the run model.

Rationale: this preserves existing selection/query behavior while avoiding duplicate endpoint ecosystems.

## Request and Response Contracts

### Preview request (conceptual)

- Selection input:
    - explicit items, or
    - query filters (same style as current bulk query selection).
- Operation params:
    - `mode`: relocate | copy | move
    - `target_object_store_id`

### Preview response (minimum)

- `snapshot_id`
- `selection_counts`:
    - `selected_items_count`
    - `expanded_leaf_count`
    - `unique_dataset_count`
- `eligibility`:
    - `eligible_count`
    - `ineligible_count`
    - per-item entries with reason codes
- `estimates`:
    - `bytes_to_transfer` (copy/move)
    - `quota_delta_by_source`
- `warnings` (non-fatal)
- `expires_at`

### Execute request

- `snapshot_id`
- `execution_policy`:
    - `skip_ineligible` default true
    - `max_retries` optional

### Execute response

- `run_id`
- `task` summary (optional passthrough)
- initial run summary counts

## Snapshot Semantics (Critical)

### Problem

Query-based selections can drift between preview and execute.

### Required behavior

- Preview resolves concrete dataset ids and stores immutable snapshot.
- Execute accepts snapshot id only.
- On execute start, revalidate eligibility for each item and report any drift as per-item ineligible-at-execute reason.
- Snapshot expiration required to avoid stale execution.

## Eligibility and Policy Matrix

### Baseline checks for all modes

- User owns mutable history context for operation.
- Dataset/item access and permissions valid.
- Dataset not blocked by active job usage policy.

### Relocate checks (phase 1)

- Same-device constraint as existing manager logic.
- Security check equivalent to existing can-change-object-store-id logic.
- Target object store selectable for current user.

### Copy/move checks (phase 2+)

- Target store capability checks.
- Metadata/extra-files migration capability check.
- Quota preflight on target quota source.

### Policy defaults

- Default skip-ineligible (per-item errors), not fail-whole-request.
- Optional strict mode can fail request if any item is ineligible.

## Collection Expansion Rules

- Always expand collections recursively to leaf datasets for execution.
- Deduplicate by underlying dataset id before estimation and execution.
- Report both item-level and leaf-level counts to avoid user confusion.

## Quota Semantics

### Relocate

- No byte copy; model as quota-source relabel behavior only where applicable.

### Copy

- Preview estimates target quota increase.
- Execute enforces preflight and per-item quota checks.

### Move

- Same as copy during transfer.
- Apply source decrement only after successful cutover/cleanup state transition.

## Metadata and Integrity Requirements (phase 2+)

Physical copy/move must include:

- Primary dataset file.
- Extra files directory contents.
- Metadata files and associated records.
- Required dataset storage pointers/references.

Verification policy:

- Configurable strictness.
- Default: size + existence checks.
- Optional strict mode: hash verification where available.

## Run Model and Status Reporting

### Need

Generic task state endpoint is insufficient for user-facing bulk migration progress.

### Add persistent run model

Run-level fields:

- `run_id`, `history_id`, `snapshot_id`, `mode`, `target_object_store_id`, `created_by`, timestamps.
- aggregate counts and bytes.
- terminal state.

Per-item fields:

- `dataset_id`
- `state` (pending, running, succeeded, failed, skipped)
- `reason_code` and message
- `attempt_count`
- `bytes_processed`
- `last_updated`

## Failure Handling and Recovery

- Per-item transaction boundaries.
- Idempotent per-item execution key: `(run_id, dataset_id)`.
- Retry only failed transient errors up to policy limit.
- Preserve partial run state for resume.
- Cleanup states for move are explicit and auditable.

## Frontend Plan

### Entry point

Add Storage operation to existing history Selection dropdown flow.

### Dialog flow

1. User chooses mode + target store.
2. User runs preview.
3. UI renders:
    - selection and leaf counts,
    - ineligible reasons,
    - estimate and quota impact,
    - warnings.
4. User confirms execute using preview snapshot.
5. UI polls run endpoint for per-item progress.

### UX requirements

- Explicit mode copy text and impact summaries.
- Show clear difference between blocked at preview and blocked at execute revalidation.
- Provide downloadable error report for large runs.

## Phased Roadmap and Exit Criteria

### Phase 1: Bulk relocate MVP

Deliver:

- Preview + execute + run status for relocate mode.
- Collection leaf expansion + dedupe.
- Existing relocate constraints mirrored in preview and execute.
- Snapshot-based execution.

Exit criteria:

- No query-selection drift in execute (snapshot only).
- Per-item reason codes for ineligible/failed items.
- Existing bulk behavior remains unchanged for non-storage operations.

### Phase 2: Bulk copy

Deliver:

- Physical copy pipeline.
- Metadata + extra files handling.
- Integrity verification and quota preflight.

Exit criteria:

- Verified integrity for copied datasets according to policy.
- Accurate quota estimate and enforcement behavior.
- Resume/retry validated for partial failures.

### Phase 3: Bulk move

Deliver:

- Move state machine: copy, verify, cutover, cleanup.
- Explicit cleanup policy and repair tooling.

Exit criteria:

- No silent data loss on interrupted move.
- Recoverable and auditable partial runs.

## Testing Strategy

### Unit

- Eligibility matrix by mode.
- Snapshot creation and expiration behavior.
- Collection expansion/dedup counts.
- Estimate correctness.

### Integration

- Explicit selection and query-selection preview/execute parity.
- Job-state blocking based on active input/output associations.
- Relocate constraints parity with existing single-item relocate.
- Copy/move metadata and extra-files integrity.
- Quota preflight and execution failures.

### Operational

- Large batch runs and UI polling load.
- Resume/retry reliability.
- Provider-specific object store behavior.

## Open Decisions (with proposed defaults)

1. Expose move initially?

- Default: no, relocate first, then copy, then move.

2. Any ineligible item should fail whole request?

- Default: no, skip ineligible with per-item errors.

3. Move cleanup timing?

- Default: delayed cleanup with explicit post-verify step.

4. Integrity strictness?

- Default: best-effort baseline checks plus optional strict hash mode.

## Final Recommendation

- Implement relocate with preview, immutable snapshots, and run-level status first.
- Keep one bulk architecture and avoid parallel APIs.
- Add copy and move only after integrity, quota, and recovery guarantees are proven by tests.


Mode	Data movement	Eligibility baseline	Quota effect	Notes
relocate	Metadata relabel only	Must satisfy current relocate constraints (same-device, ownership/shareability checks)	Quota relabel only if quota source label changes	Fast, no byte transfer
copy	Physical copy to target, source retained	Target store must support copy pipeline	Target quota increases by copied bytes	Introduced in phase 2
move	Copy + cutover + source cleanup policy	Same as copy + cleanup eligibility	Target quota increases, source decreases after cleanup	Introduced in phase 3

Bulk storage operations (copy/move) support #22404

Description

Bulk Storage Operations Plan

Goal

Scope and Non-Goals

In scope

Out of scope (phase 1)

Codebase Anchors

Design Principles

Operation Modes

Unified API Strategy

Decision

API shape

Request and Response Contracts

Preview request (conceptual)

Preview response (minimum)

Execute request

Execute response

Snapshot Semantics (Critical)

Problem

Required behavior

Eligibility and Policy Matrix

Baseline checks for all modes

Relocate checks (phase 1)

Copy/move checks (phase 2+)

Policy defaults

Collection Expansion Rules

Quota Semantics

Relocate

Copy

Move

Metadata and Integrity Requirements (phase 2+)

Run Model and Status Reporting

Need

Add persistent run model

Failure Handling and Recovery

Frontend Plan

Entry point

Dialog flow

UX requirements

Phased Roadmap and Exit Criteria

Phase 1: Bulk relocate MVP

Phase 2: Bulk copy

Phase 3: Bulk move

Testing Strategy

Unit

Integration

Operational

Open Decisions (with proposed defaults)

Final Recommendation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions