Skip to content

Retire marin-tmp-* scratch buckets; apply TTL lifecycle to main buckets #5264

@ravwojdyla-agent

Description

@ravwojdyla-agent

Description

PR #2747 (commit 3a0deaf) provisioned per-region marin-tmp-{region} scratch buckets with ttl=Nd/-prefix lifecycle rules. The original motivation in #1721 was twofold: auto-cleanup of intermediate data, and enabling soft-delete on main buckets. The latter never happened — docs/tutorials/storage-bucket.md:50 tells users to disable soft-delete on main buckets. The TTL-prefix mechanism works on any bucket, so the separate buckets add complexity (mapping kept in sync between lib/rigging and infra/configure_temp_buckets.py, tmp- special case in scripts/ops/cross_region.py, extra IAM) without a remaining benefit.

Proposal: apply the same ttl=Nd/-prefix delete rules to existing marin-{region} buckets, repoint marin_temp_bucket() (lib/rigging/src/rigging/filesystem.py:166) to write under gs://marin-{region}/ttl=Nd/..., and decommission marin-tmp-*.

Current usage of marin_temp_bucket(): lib/iris/src/iris/cli/cluster.py:317 (iris ephemeral state), lib/marin/src/marin/training/training.py:109,250 (compilation cache + training paths), lib/marin/src/marin/execution/disk_cache.py:101, lib/zephyr/src/zephyr/execution.py:1580 (zephyr chunk storage), experiments/ferries/datakit_*.py, experiments/dedup/poc_nemotron.py, scripts/datakit/run_source_sampling.py, .github/workflows/marin-datakit-*.yaml.

Open questions:

Definition of Done

  • infra/configure_temp_buckets.py (renamed/generalized) merges ttl=Nd/-prefix rules into marin-{region} buckets without clobbering unrelated lifecycle rules.
  • REGION_TO_TMP_BUCKET in lib/rigging/src/rigging/filesystem.py repointed to main buckets, or folded into REGION_TO_DATA_BUCKET.
  • scripts/ops/cross_region.py:155-160 tmp- special case removed.
  • lib/iris/scripts/setup_iam.py:482 IAM probe path updated.
  • Tests updated: lib/iris/tests/test_marin_fs.py, tests/test_grug_launch_checkpoint_paths.py, tests/test_training.py, tests/execution/test_disk_cache.py.
  • CI workflows updated: .github/workflows/marin-datakit-{smoke,nemotron-ferry}.yaml.
  • Docs updated: docs/tutorials/storage-bucket.md (Step 4), experiments/ferries/OPS.md.
  • marin-tmp-* buckets drained and deleted after transition (max 30d for compilation cache TTL).

cc @dlwh @rjpower

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions