Skip to content

Commit 9a3b5f6

Browse files
committed
make sure citation has no hallucination
1 parent fe69523 commit 9a3b5f6

File tree

14 files changed

+2699
-67
lines changed

14 files changed

+2699
-67
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ Give it a research idea and a target venue. ARK handles the rest.
5757
| **Compute** | Slurm • Local • AWS • GCP • Azure | Run experiments anywhere |
5858
| **Deep Research** | Gemini Deep Research integration | Literature survey before writing starts |
5959
| **Nano Banana** | AI figure generation | Concept diagrams via Gemini image models |
60+
| **Citation Integrity** | API-first citations • dual-source verification | DBLP/CrossRef — LLM never writes BibTeX |
6061
| **Smart Recovery** | Checkpoint/resume • meta-debug • self-repair | Handles LaTeX errors, experiment failures |
6162
| **Cost Tracking** | Per-iteration and cumulative reports | Know exactly what each iteration costs |
6263

@@ -253,7 +254,7 @@ See [TODO.md](TODO.md) for the full list. Highlights:
253254
- **Cloud compute verification** — AWS/GCP/Azure compute backends exist but are not yet end-to-end validated
254255
- **Edge & custom environments** — support air-gapped HPC, Jetson, limited-connectivity labs
255256
- **Figure layout quality** — column overflow, font size mismatch, subplot alignment issues
256-
- **Citation authenticity**LLM-generated references can be hallucinated; need post-write verification against Semantic Scholar / CrossRef
257+
- **Citation integrity**API-first citation system (DBLP/CrossRef), LLM never writes BibTeX, per-iteration verification
257258
- **Integration testing** — no end-to-end pipeline test yet
258259

259260
## License

README_ar.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
| **الحوسبة** | Slurm • Local • AWS • GCP • Azure | تشغيل التجارب في أي مكان |
5858
| **البحث المعمّق** | تكامل Gemini Deep Research | مسح أدبي قبل بدء الكتابة |
5959
| **Nano Banana** | توليد رسوم بالذكاء الاصطناعي | مخططات مفاهيمية عبر نماذج Gemini |
60+
| **سلامة الاستشهادات** | استشهادات عبر API أولاً • تحقق من مصدرين | DBLP/CrossRef — LLM لا يكتب BibTeX |
6061
| **استرداد ذكي** | نقاط حفظ • تصحيح تلقائي • إصلاح ذاتي | معالجة أخطاء LaTeX وفشل التجارب |
6162
| **تتبع التكلفة** | تقارير لكل تكرار وتراكمية | معرفة دقيقة بتكلفة كل تكرار |
6263

@@ -253,7 +254,7 @@ pip install -e ".[research]" # + Gemini Deep Research و Nano Banana
253254
- **التحقق من الحوسبة السحابية** — أكواد AWS/GCP/Azure موجودة لكن لم تُختبر من البداية للنهاية
254255
- **البيئات المحدودة والمخصصة** — دعم HPC بدون إنترنت، Jetson، مختبرات ذات اتصال محدود
255256
- **جودة تنسيق الرسوم** — تجاوز عرض العمود، عدم تطابق حجم الخط، مشاكل محاذاة الرسوم الفرعية
256-
- **مصداقية الاستشهادات**المراجع المولّدة بالذكاء الاصطناعي قد تكون وهمية؛ يلزم التحقق بعد الكتابة عبر Semantic Scholar / CrossRef
257+
- **سلامة الاستشهادات**نظام استشهادات عبر API أولاً (DBLP/CrossRef)، LLM لا يكتب BibTeX، تحقق تلقائي كل تكرار
257258
- **اختبار التكامل** — لا يوجد اختبار شامل للمسار بعد
258259

259260
## الرخصة

README_zh.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ ARK 协调 8 个专业 AI 智能体来**规划实验、编写代码、运行基
5757
| **计算后端** | Slurm • Local • AWS • GCP • Azure | 在任何平台运行实验 |
5858
| **深度调研** | Gemini Deep Research 集成 | 写作前自动进行文献综述 |
5959
| **Nano Banana** | AI 图表生成 | 通过 Gemini 图像模型生成概念图 |
60+
| **引用完整性** | API-first 引用 • 双源验证 | DBLP/CrossRef — LLM 不写 BibTeX |
6061
| **智能恢复** | 断点续传 • 元调试 • 自修复 | 处理 LaTeX 错误、实验失败 |
6162
| **成本追踪** | 每次迭代和累计报告 | 精确了解每次迭代的开销 |
6263

@@ -253,7 +254,7 @@ pip install -e ".[research]" # + Gemini Deep Research 和 Nano Banana
253254
- **云计算验证** — AWS/GCP/Azure 计算后端代码已有,但未经端到端验证
254255
- **边缘与定制环境** — 支持离线 HPC、Jetson、受限网络实验室
255256
- **图表排版质量** — 列宽溢出、字体大小不匹配、子图对齐问题
256-
- **引用真实性**LLM 生成的参考文献可能是幻觉;需要写后通过 Semantic Scholar / CrossRef 验证
257+
- **引用完整性**API-first 引用系统(DBLP/CrossRef),LLM 不写 BibTeX,每轮迭代自动验证
257258
- **集成测试** — 尚无端到端 pipeline 测试
258259

259260
## 许可证

TODO.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -53,15 +53,16 @@
5353
- Need: stricter post-compilation visual checks — compare rendered PDF region against template spec
5454
- Consider: pixel-level overlap detection for text/figure collisions
5555

56-
### [ ] Citation authenticity & hallucination
57-
- LLM-generated references are frequently hallucinated (wrong author, wrong year, non-existent papers)
58-
- Current pipeline has no citation verification step
59-
- Need: post-write citation verification phase
60-
- Cross-check each `\cite{}` entry against Semantic Scholar / CrossRef / Google Scholar API
61-
- Verify: title exists, authors match, year matches, DOI resolves
62-
- Flag or remove unverifiable citations
63-
- Need: researcher agent should provide real BibTeX entries from actual database queries, not LLM memory
64-
- Consider: mandatory `references.bib` sourced exclusively from API-fetched entries
56+
### [x] Citation authenticity & hallucination
57+
- Implemented API-first citation system (`ark/citation.py`)
58+
- LLM never writes BibTeX — all entries fetched from DBLP / CrossRef official APIs
59+
- Search cascade: DBLP → CrossRef → arXiv → Semantic Scholar
60+
- Researcher agent selects papers from API-verified candidate list only
61+
- Per-iteration verification: every review cycle re-verifies `references.bib`
62+
- Dual-source cross-confirmation (DBLP + CrossRef)
63+
- Preprint → published version auto-upgrade
64+
- Unused citation cleanup (removes uncited entries from `.bib`)
65+
- CLI tools: `ark cite-check`, `ark cite-search`, `ark cite-debug`
6566

6667
### [ ] Table formatting
6768
- Tables can overflow column/page width in two-column venues

ark/agents.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ def _run(self):
106106
AGENT_CONTEXT_PROFILES = {
107107
"reviewer": {"memory": True, "deep_research": False, "prior_context": False, "context_files": False},
108108
"planner": {"memory": True, "deep_research": False, "prior_context": True, "context_files": False},
109-
"writer": {"memory": False, "deep_research": False, "prior_context": True, "context_files": False},
109+
"writer": {"memory": False, "deep_research": True, "prior_context": True, "context_files": False},
110110
"experimenter": {"memory": False, "deep_research": True, "prior_context": False, "context_files": True},
111111
"researcher": {"memory": False, "deep_research": True, "prior_context": False, "context_files": True},
112112
"visualizer": {"memory": False, "deep_research": False, "prior_context": False, "context_files": False},

0 commit comments

Comments
 (0)