Skip to content

Fix non-pickleable retriever worker failures#203

Merged
TideDra merged 1 commit into
mainfrom
fix-issue-202-bufferedreader
Mar 18, 2026
Merged

Fix non-pickleable retriever worker failures#203
TideDra merged 1 commit into
mainfrom
fix-issue-202-bufferedreader

Conversation

@TideDra

@TideDra TideDra commented Mar 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • catch conversion failures inside process-pool workers so non-pickleable exceptions do not crash the whole retrieval batch
  • harden arXiv PDF/source downloads to degrade to warnings and allow fallback behavior
  • add a regression test covering a worker-raised HTTPError carrying a BufferedReader

Testing

  • uv run pytest -q tests/retriever/test_arxiv_retriever.py -k non_pickleable_worker_errors
  • uv run python -m compileall src tests/retriever/test_arxiv_retriever.py

Closes #202

@TideDra TideDra merged commit 168f2cb into main Mar 18, 2026
1 check passed
@TideDra TideDra deleted the fix-issue-202-bufferedreader branch March 25, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError: cannot pickle 'BufferedReader' instances

1 participant