Enhance arXiv retrieval robustness: add retry logic for HTTP errors by dekrt · Pull Request #261 · TideDra/zotero-arxiv-daily

dekrt · 2026-06-06T03:59:54Z

This pull request improves the robustness of the arXiv paper retrieval process by handling additional retryable HTTP errors and adds tests to ensure correct behavior in these scenarios. The main changes are the introduction of a set of retryable status codes, enhanced error handling logic during batch retrieval, and new tests for these cases.

Error handling improvements:

Introduced RETRYABLE_ARXIV_STATUSES in arxiv_retriever.py to define which HTTP status codes (429, 500, 502, 503, 504) should trigger a retry when communicating with the arXiv API.
Updated the _retrieve_raw_papers method to retry on any status in RETRYABLE_ARXIV_STATUSES, log appropriate warnings, and skip batches after maximum retries, ensuring only truly unrecoverable errors are raised.

Testing improvements:

Added pytest as a test dependency for enhanced testing capabilities.
Added tests to verify that batches are skipped after retryable HTTP errors and that non-retryable errors are raised, ensuring the new error handling logic works as intended.

Make arXiv retrieval resilient to transient 5xx API failures in `calculate-and-send`

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves arXiv batch retrieval resilience by retrying additional transient HTTP errors (not just 429), skipping batches after exhausting retries, and adding tests to validate retryable vs non-retryable behavior.

Changes:

Add a shared set of retryable arXiv HTTP statuses and use it in _retrieve_raw_papers.
Skip a batch after max retries for retryable status codes and continue processing.
Add pytest coverage for retryable (503) and non-retryable (400) HTTP error handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/retriever/test_arxiv_retriever.py	Adds tests covering skipped batches after retryable errors and raising on non-retryable HTTP errors.
src/zotero_arxiv_daily/retriever/arxiv_retriever.py	Expands retry logic to multiple transient HTTP statuses and adds batch-skipping behavior after retries.

+                    elif status in RETRYABLE_ARXIV_STATUSES:
+                        logger.warning(
+                            f"Skipping batch {i // 20} after {max_batch_retries} retries due to arXiv API {status}"
+                        )
+                        break
                    else:
                        raise
+            if not batch_succeeded:
+                logger.warning(f"No papers retrieved for batch {i // 20}")


Copilot AI and others added 3 commits June 5, 2026 06:15

Initial plan

91e8649

Handle transient arXiv API 5xx errors in batch retrieval

f0df45c

Merge pull request #1 from dekrt/copilot/fix-calculate-and-send-job

3e85ca5

Make arXiv retrieval resilient to transient 5xx API failures in `calculate-and-send`

Copilot AI review requested due to automatic review settings June 6, 2026 03:59

Copilot AI reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance arXiv retrieval robustness: add retry logic for HTTP errors#261

Enhance arXiv retrieval robustness: add retry logic for HTTP errors#261
dekrt wants to merge 3 commits into
TideDra:mainfrom
dekrt:main

dekrt commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dekrt commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants