Handle resuming tile cache downloads#242
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fishnet["index"] = fishnet.index | ||
| existing_tile_paths = self._list_existing_tile_filepaths(target_uri) | ||
| completed_tile_ids = self._get_completed_tile_ids(existing_tile_paths) | ||
| unretrieved_tiles = fishnet.drop(completed_tile_ids, errors="ignore") |
There was a problem hiding this comment.
Tile cache completeness check ignores S3 pagination
_has_incomplete_tile_cache determines whether to rebuild by listing existing tile files, but for S3 targets _list_existing_tile_filepaths delegates to _list_all_tiff_filepaths_in_s3_folder, which issues a single list_objects_v2 call (default 1,000 key limit). On caches with more than 1,000 tiles, completed_tile_ids will only include the first page of objects, so unretrieved_tiles stays non‑empty and the cache is always treated as incomplete even when fully populated, forcing unnecessary reprocessing for large cities.
Useful? React with 👍 / 👎.
Summary
Testing
pytest tests/test_tile_cache_resumption.py(fails: missing boto3 dependency in test environment; network access blocked when attempting to install)Codex Task