Description
PR #2747 (commit 3a0deaf) provisioned per-region marin-tmp-{region} scratch buckets with ttl=Nd/-prefix lifecycle rules. The original motivation in #1721 was twofold: auto-cleanup of intermediate data, and enabling soft-delete on main buckets. The latter never happened — docs/tutorials/storage-bucket.md:50 tells users to disable soft-delete on main buckets. The TTL-prefix mechanism works on any bucket, so the separate buckets add complexity (mapping kept in sync between lib/rigging and infra/configure_temp_buckets.py, tmp- special case in scripts/ops/cross_region.py, extra IAM) without a remaining benefit.
Proposal: apply the same ttl=Nd/-prefix delete rules to existing marin-{region} buckets, repoint marin_temp_bucket() (lib/rigging/src/rigging/filesystem.py:166) to write under gs://marin-{region}/ttl=Nd/..., and decommission marin-tmp-*.
Current usage of marin_temp_bucket(): lib/iris/src/iris/cli/cluster.py:317 (iris ephemeral state), lib/marin/src/marin/training/training.py:109,250 (compilation cache + training paths), lib/marin/src/marin/execution/disk_cache.py:101, lib/zephyr/src/zephyr/execution.py:1580 (zephyr chunk storage), experiments/ferries/datakit_*.py, experiments/dedup/poc_nemotron.py, scripts/datakit/run_source_sampling.py, .github/workflows/marin-datakit-*.yaml.
Open questions:
Definition of Done
cc @dlwh @rjpower
Description
PR #2747 (commit 3a0deaf) provisioned per-region
marin-tmp-{region}scratch buckets withttl=Nd/-prefix lifecycle rules. The original motivation in #1721 was twofold: auto-cleanup of intermediate data, and enabling soft-delete on main buckets. The latter never happened —docs/tutorials/storage-bucket.md:50tells users to disable soft-delete on main buckets. The TTL-prefix mechanism works on any bucket, so the separate buckets add complexity (mapping kept in sync betweenlib/riggingandinfra/configure_temp_buckets.py,tmp-special case inscripts/ops/cross_region.py, extra IAM) without a remaining benefit.Proposal: apply the same
ttl=Nd/-prefix delete rules to existingmarin-{region}buckets, repointmarin_temp_bucket()(lib/rigging/src/rigging/filesystem.py:166) to write undergs://marin-{region}/ttl=Nd/..., and decommissionmarin-tmp-*.Current usage of
marin_temp_bucket():lib/iris/src/iris/cli/cluster.py:317(iris ephemeral state),lib/marin/src/marin/training/training.py:109,250(compilation cache + training paths),lib/marin/src/marin/execution/disk_cache.py:101,lib/zephyr/src/zephyr/execution.py:1580(zephyr chunk storage),experiments/ferries/datakit_*.py,experiments/dedup/poc_nemotron.py,scripts/datakit/run_source_sampling.py,.github/workflows/marin-datakit-*.yaml.Open questions:
marin-{region}bucket.ttl=Nd/paths that would suddenly become deletable.REGION_TO_TMP_BUCKET).Definition of Done
infra/configure_temp_buckets.py(renamed/generalized) mergesttl=Nd/-prefix rules intomarin-{region}buckets without clobbering unrelated lifecycle rules.REGION_TO_TMP_BUCKETinlib/rigging/src/rigging/filesystem.pyrepointed to main buckets, or folded intoREGION_TO_DATA_BUCKET.scripts/ops/cross_region.py:155-160tmp-special case removed.lib/iris/scripts/setup_iam.py:482IAM probe path updated.lib/iris/tests/test_marin_fs.py,tests/test_grug_launch_checkpoint_paths.py,tests/test_training.py,tests/execution/test_disk_cache.py..github/workflows/marin-datakit-{smoke,nemotron-ferry}.yaml.docs/tutorials/storage-bucket.md(Step 4),experiments/ferries/OPS.md.marin-tmp-*buckets drained and deleted after transition (max 30d for compilation cache TTL).cc @dlwh @rjpower