Skip to content

Commit f83b3f9

Browse files
authored
Merge pull request #223 from TideDra/dev
test: rewrite test suite with native stubs, 86% coverage
2 parents 91632d7 + e30e47b commit f83b3f9

28 files changed

Lines changed: 1710 additions & 538 deletions

.github/workflows/ci.yml

Lines changed: 1 addition & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,31 +12,13 @@ on:
1212
jobs:
1313
pytest:
1414
runs-on: ubuntu-latest
15-
services:
16-
mailhog:
17-
image: mailhog/mailhog:latest
18-
ports:
19-
- 1025:1025 # SMTP
20-
openai:
21-
image: tidedra/mock_openai:latest
22-
ports:
23-
- 30000:30000
2415
steps:
2516
- name: Checkout
2617
uses: actions/checkout@v6
2718

2819
- name: Setup uv
2920
uses: astral-sh/setup-uv@v7.1.4
3021

31-
3222
- name: Run Pytest
33-
env:
34-
ZOTERO_ID: "0"
35-
ZOTERO_KEY: "AbCdEfGhIjKlMnOpQrStUvWx"
36-
SENDER: "test@example.com"
37-
RECEIVER: "test@example.com"
38-
SENDER_PASSWORD: "test"
39-
OPENAI_API_KEY: "sk-xxx"
40-
OPENAI_API_BASE: "http://openai:30000/v1"
4123
run: |
42-
uv run pytest -m ""
24+
uv run pytest -m "" --cov=src/zotero_arxiv_daily --cov-report=term-missing

CLAUDE.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Zotero-arXiv-Daily recommends new arXiv/bioRxiv/medRxiv papers based on a user's Zotero library. It computes embedding similarity between new papers and the user's existing library, generates TLDRs via LLM, and delivers results by email. Designed to run as a GitHub Actions workflow at zero cost.
8+
9+
## Commands
10+
11+
```bash
12+
# Run the application
13+
uv run src/zotero_arxiv_daily/main.py
14+
15+
# Run tests (excludes slow tests by default)
16+
uv run pytest
17+
18+
# Run all tests including slow ones
19+
uv run pytest -m ""
20+
21+
# Run a single test
22+
uv run pytest tests/test_utils.py::TestGlobMatch -v
23+
24+
# Install/sync dependencies
25+
uv sync
26+
```
27+
28+
No linter or formatter is configured.
29+
30+
## Architecture
31+
32+
The app follows a linear pipeline orchestrated by `Executor` (`src/zotero_arxiv_daily/executor.py`):
33+
34+
1. **Fetch Zotero corpus** — retrieves user's library papers via pyzotero API
35+
2. **Filter corpus** — applies `include_path` glob patterns to select relevant collections
36+
3. **Retrieve new papers** — fetches from configured sources (arXiv RSS, bioRxiv/medRxiv REST API)
37+
4. **Rerank** — scores candidates by weighted similarity to corpus (newer Zotero papers weighted higher)
38+
5. **Generate TLDRs + affiliations** — via OpenAI-compatible LLM API
39+
6. **Render + send email** — HTML email via SMTP
40+
41+
### Plugin Systems
42+
43+
**Retrievers** (`src/zotero_arxiv_daily/retriever/`): Register via `@register_retriever` decorator, discovered by `get_retriever_cls()`. Each retriever implements `_retrieve_raw_papers()` and `convert_to_paper()`.
44+
45+
**Rerankers** (`src/zotero_arxiv_daily/reranker/`): Register via `@register_reranker` decorator, discovered by `get_reranker_cls()`. Two implementations: `local` (sentence-transformers) and `api` (OpenAI-compatible embeddings endpoint).
46+
47+
### Configuration
48+
49+
Uses Hydra + OmegaConf. Config is composed from `config/base.yaml` (defaults) + `config/custom.yaml` (user overrides). Environment variables are interpolated via `${oc.env:VAR_NAME,default}` syntax. Entry point uses `@hydra.main`.
50+
51+
### Data Classes
52+
53+
`Paper` and `CorpusPaper` in `src/zotero_arxiv_daily/protocol.py`. `Paper` has LLM-powered methods (`generate_tldr`, `generate_affiliations`) that call the OpenAI API directly.
54+
55+
## Testing
56+
57+
Tests marked `@pytest.mark.slow` require heavy dependencies (e.g., sentence-transformers model download) and are skipped locally by default (`addopts = "-m 'not slow'"` in pyproject.toml). All other tests run with pure Python stubs (no Docker containers needed).
58+
59+
```bash
60+
# Run tests (excludes slow tests)
61+
uv run pytest
62+
63+
# Run all tests including slow ones
64+
uv run pytest -m ""
65+
66+
# Run with coverage
67+
uv run pytest --cov=src/zotero_arxiv_daily --cov-report=term-missing
68+
```
69+
70+
## gstack
71+
72+
Use the `/browse` skill from gstack for all web browsing. Never use `mcp__claude-in-chrome__*` tools.
73+
74+
Available skills: `/office-hours`, `/plan-ceo-review`, `/plan-eng-review`, `/plan-design-review`, `/design-consultation`, `/design-shotgun`, `/design-html`, `/review`, `/ship`, `/land-and-deploy`, `/canary`, `/benchmark`, `/browse`, `/connect-chrome`, `/qa`, `/qa-only`, `/design-review`, `/setup-browser-cookies`, `/setup-deploy`, `/retro`, `/investigate`, `/document-release`, `/codex`, `/cso`, `/autoplan`, `/plan-devex-review`, `/devex-review`, `/careful`, `/freeze`, `/guard`, `/unfreeze`, `/gstack-upgrade`, `/learn`.
75+
76+
If gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
77+
78+
## Git Workflow
79+
80+
- PRs should target the `dev` branch, not `main`
81+
- Current development branch: `dev`

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@ url = "https://download.pytorch.org/whl/cpu"
3737
explicit = true
3838

3939
[tool.pytest.ini_options]
40-
addopts = "-m 'not ci'"
40+
addopts = "-m 'not slow'"
4141
markers = [
42-
"ci: tests that only run in CI (require external services)",
42+
"slow: tests that are slow (e.g. download models)",
4343
]
4444
filterwarnings = [
4545
"ignore::DeprecationWarning:multiprocessing",
@@ -49,4 +49,5 @@ filterwarnings = [
4949
dev = [
5050
"ipykernel>=7.1.0",
5151
"pytest>=8.4.1",
52+
"pytest-cov>=6.0",
5253
]

tests/__init__.py

Whitespace-only changes.

tests/canned_responses.py

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
"""Shared stub factories for tests. No unittest.mock anywhere."""
2+
3+
from datetime import datetime
4+
from types import SimpleNamespace
5+
6+
from zotero_arxiv_daily.protocol import CorpusPaper, Paper
7+
8+
9+
# ---------------------------------------------------------------------------
10+
# OpenAI client stub
11+
# ---------------------------------------------------------------------------
12+
13+
_AFFILIATION_MARKER = "You are an assistant who perfectly extracts affiliations"
14+
_AFFILIATION_RESPONSE = '["TsingHua University","Peking University"]'
15+
_TLDR_RESPONSE = "Hello! How can I assist you today?"
16+
17+
18+
def _make_chat_response(content: str) -> SimpleNamespace:
19+
return SimpleNamespace(
20+
choices=[
21+
SimpleNamespace(
22+
message=SimpleNamespace(content=content),
23+
finish_reason="stop",
24+
index=0,
25+
)
26+
],
27+
id="chatcmpl-stub",
28+
created=1765197615,
29+
model="gpt-4o-mini-2024-07-18",
30+
object="chat.completion",
31+
)
32+
33+
34+
def _stub_chat_create(**kwargs):
35+
messages = kwargs.get("messages", [])
36+
request_str = str(messages)
37+
if _AFFILIATION_MARKER in request_str:
38+
return _make_chat_response(_AFFILIATION_RESPONSE)
39+
return _make_chat_response(_TLDR_RESPONSE)
40+
41+
42+
def _stub_embeddings_create(**kwargs):
43+
inputs = kwargs.get("input", [])
44+
n = len(inputs) if isinstance(inputs, list) else 1
45+
return SimpleNamespace(
46+
data=[SimpleNamespace(embedding=[0.1, 0.2, 0.3], index=i, object="embedding") for i in range(n)],
47+
model="text-embedding-3-large",
48+
object="list",
49+
)
50+
51+
52+
def make_stub_openai_client():
53+
"""Return a SimpleNamespace that quacks like openai.OpenAI().
54+
55+
chat.completions.create() and embeddings.create() behave identically
56+
to the Docker mock_openai server that CI previously relied on.
57+
"""
58+
return SimpleNamespace(
59+
chat=SimpleNamespace(
60+
completions=SimpleNamespace(create=_stub_chat_create),
61+
),
62+
embeddings=SimpleNamespace(create=_stub_embeddings_create),
63+
)
64+
65+
66+
# ---------------------------------------------------------------------------
67+
# Zotero client stub
68+
# ---------------------------------------------------------------------------
69+
70+
_DEFAULT_COLLECTIONS = [
71+
{
72+
"key": "COL1",
73+
"data": {"name": "survey", "parentCollection": False},
74+
},
75+
{
76+
"key": "COL2",
77+
"data": {"name": "topic-a", "parentCollection": "COL1"},
78+
},
79+
]
80+
81+
_DEFAULT_ITEMS = [
82+
{
83+
"data": {
84+
"title": "Stub Paper 1",
85+
"abstractNote": "Abstract of stub paper 1.",
86+
"dateAdded": "2026-01-15T10:00:00Z",
87+
"collections": ["COL2"],
88+
},
89+
},
90+
{
91+
"data": {
92+
"title": "Stub Paper 2",
93+
"abstractNote": "Abstract of stub paper 2.",
94+
"dateAdded": "2026-02-20T12:00:00Z",
95+
"collections": ["COL1"],
96+
},
97+
},
98+
]
99+
100+
101+
def make_stub_zotero_client(collections=None, items=None):
102+
"""Return a SimpleNamespace that quacks like pyzotero.zotero.Zotero.
103+
104+
Supports the call patterns used by Executor.fetch_zotero_corpus():
105+
zot.everything(zot.collections())
106+
zot.everything(zot.items(itemType=...))
107+
"""
108+
cols = collections if collections is not None else _DEFAULT_COLLECTIONS
109+
itms = items if items is not None else _DEFAULT_ITEMS
110+
111+
def everything(generator):
112+
return generator
113+
114+
def collections_fn():
115+
return cols
116+
117+
def items_fn(**kwargs):
118+
return itms
119+
120+
return SimpleNamespace(
121+
everything=everything,
122+
collections=collections_fn,
123+
items=items_fn,
124+
)
125+
126+
127+
# ---------------------------------------------------------------------------
128+
# SMTP stub
129+
# ---------------------------------------------------------------------------
130+
131+
132+
def make_stub_smtp(sent_emails: list):
133+
"""Return a class that records calls to sendmail().
134+
135+
Usage:
136+
sent = []
137+
monkeypatch.setattr(smtplib, "SMTP", make_stub_smtp(sent))
138+
...
139+
assert len(sent) == 1
140+
sender, recipients, body = sent[0]
141+
"""
142+
143+
class StubSMTP:
144+
def __init__(self, *args, **kwargs):
145+
pass
146+
147+
def starttls(self):
148+
pass
149+
150+
def login(self, user, password):
151+
pass
152+
153+
def sendmail(self, sender, recipients, msg):
154+
sent_emails.append((sender, recipients, msg))
155+
156+
def quit(self):
157+
pass
158+
159+
return StubSMTP
160+
161+
162+
# ---------------------------------------------------------------------------
163+
# Paper / CorpusPaper factories
164+
# ---------------------------------------------------------------------------
165+
166+
167+
def make_sample_paper(**overrides) -> Paper:
168+
defaults = dict(
169+
source="arxiv",
170+
title="Sample Paper Title",
171+
authors=["Author A", "Author B", "Author C"],
172+
abstract="This paper explores a novel approach to widget engineering.",
173+
url="https://arxiv.org/abs/2026.00001",
174+
pdf_url="https://arxiv.org/pdf/2026.00001",
175+
full_text="\\begin{document} Some text. \\end{document}",
176+
tldr=None,
177+
affiliations=None,
178+
score=None,
179+
)
180+
defaults.update(overrides)
181+
return Paper(**defaults)
182+
183+
184+
def make_sample_corpus(n: int = 3) -> list[CorpusPaper]:
185+
return [
186+
CorpusPaper(
187+
title=f"Corpus Paper {i}",
188+
abstract=f"Abstract for corpus paper {i}.",
189+
added_date=datetime(2026, 1, 1 + i),
190+
paths=[f"2026/survey/topic-{i}"],
191+
)
192+
for i in range(n)
193+
]
194+
195+
196+
# ---------------------------------------------------------------------------
197+
# bioRxiv canned API response
198+
# ---------------------------------------------------------------------------
199+
200+
SAMPLE_BIORXIV_API_RESPONSE = {
201+
"messages": [{"status": "ok"}],
202+
"collection": [
203+
{
204+
"doi": "10.1101/2026.03.01.000001",
205+
"title": "A biorxiv paper",
206+
"authors": "Smith, J.; Doe, A.; Lee, K.",
207+
"abstract": "We present a novel finding.",
208+
"date": "2026-03-02",
209+
"category": "bioinformatics",
210+
"version": "1",
211+
},
212+
{
213+
"doi": "10.1101/2026.03.01.000002",
214+
"title": "Another biorxiv paper",
215+
"authors": "Wang, L.; Chen, M.",
216+
"abstract": "We replicate a key result.",
217+
"date": "2026-03-02",
218+
"category": "genomics",
219+
"version": "1",
220+
},
221+
{
222+
"doi": "10.1101/2026.03.01.000003",
223+
"title": "Old biorxiv paper",
224+
"authors": "Old, R.",
225+
"abstract": "Yesterday's paper.",
226+
"date": "2026-03-01",
227+
"category": "bioinformatics",
228+
"version": "1",
229+
},
230+
],
231+
}

0 commit comments

Comments
 (0)