Core idea: Trust the AI's judgment; code handles execution and guardrails only.
- DB as source of truth — project config and status live in SQLite; YAML is used only for per-agent runtime state
- Per-project isolation — each project gets its own conda env, sandboxed HOME, and
PYTHONNOUSERSITE=1 - Skills over hard-coded rules — modular instruction sets (skills) are loaded at runtime to enforce best practices
ARK runs three phases in sequence:
┌─────────────────────────────────────────────────────────────────┐
│ ARK Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: Research (4-step) │
│ ┌──────────────┐ ┌─────────────┐ ┌─────────┐ ┌──────────┐ │
│ │Deep Research │─▶│ Initializer │─▶│ Planner │─▶│Experiment│ │
│ │(Gemini) │ │(bootstrap) │ │(plan) │ │(run) │ │
│ └──────────────┘ └─────────────┘ └─────────┘ └──────────┘ │
│ │
│ Phase 2: Dev │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ plan → experiment on Slurm → analyze → write draft │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ Phase 3: Review (iterative loop) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌��─────────┐ │
│ │ Compile │─▶│ Review │─▶│ Planner │─▶│ Execute │──┐ │
│ │ LaTeX │ │ Score │ │ Decide │ │ Run │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ ▲ │ │
│ └──── Validate ◀────────────────────────────────────┘ │
��� (recompile) │
│ │
│ Loop until score ≥ threshold or human intervention │
└─────────────────────────────────────────────────────────────────┘
| Step | Agent | What Happens |
|---|---|---|
| 1 | Deep Research | Gemini literature survey, background knowledge gathering |
| 2 | Initializer | Bootstrap conda env, install builtin skills, prepare citations |
| 3 | Planner | Generate initial research plan from survey results |
| 4 | Experimenter | Run first round of experiments based on plan |
Each iteration runs 5 steps: Compile → Review → Plan → Execute → Validate.
The Planner outputs structured YAML action plans:
actions:
- agent: experimenter
task: "Run perplexity validation experiment"
priority: 1
- agent: writer
task: "Update Section 4.2"
priority: 2Tracks scores, detects stagnation, and prevents repetitive failures:
class SimpleMemory:
scores: List[float] # Score history (last 20)
best_score: float # Historical best
stagnation_count: int # Consecutive stagnation count
def record_score(score) # Record a score
def is_stagnating() # Stagnation detection
def get_context() # Get context (Goal Anchor + score trends)Additional features:
- Issue tracking: Content-based dedup — counts how many times each issue reappears across iterations
- Repair validation: Verifies that attempted fixes actually resolved the issue
- Strategy escalation: Automatically bans ineffective methods and suggests alternatives
- Meta-debugging: Triggers diagnostic when the system is stuck
Every agent invocation includes a constant "Goal Anchor" that describes the project's core objectives. This prevents agents from drifting off-topic over many iterations.
Mixin-based design with 5 mixins:
class Orchestrator(ResearchMixin, DevMixin, ReviewMixin, FigureMixin, BaseMixin):
# Dispatches to the correct phase based on mode
# Syncs status to DB after each step
# Handles Telegram notificationsModular instruction sets loaded at runtime:
| Skill | Purpose |
|---|---|
| research-integrity | Anti-simulation: agents must run real experiments |
| human-intervention | Escalation protocol via Telegram |
| env-isolation | Per-project environment boundaries |
| figure-integrity | Validates figures match actual data |
| page-adjustment | Content density control within page limits |
Skills are auto-installed during pipeline bootstrap (Research Phase Step 2).
Each project gets a sandboxed conda env:
provision_project_env()clones base env to<project>/.env/project_env_ready()checks if env exists- Orchestrator runs with
HOME=<project_dir>,PYTHONNOUSERSITE=1 - Both CLI (
ark run) and Web Portal auto-detect and use the project env
SQLite is the source of truth for project config and status:
- Project creation, config, phase status
- Score history, cost tracking
- CLI and webapp read/write the same DB
- YAML files under
auto_research/state/are for per-agent runtime state only
| Agent | Role |
|---|---|
| initializer | Bootstraps project: conda env, skills, citations |
| reviewer | Reviews and scores the paper |
| planner | Analyzes issues, generates action plan (paper & dev modes) |
| experimenter | Designs, runs, and analyzes experiments |
| researcher | Literature search and experimental result analysis |
| writer | Writes/revises paper sections |
| visualizer | Checks and fixes figure/table quality |
| meta_debugger | System-level diagnosis |
| coder | Implements code changes (dev mode) |
ARK/
├── ark/
│ ├── orchestrator.py # Main loop (mixin-based)
│ ├── pipeline.py # Research phase 4-step pipeline
│ ├── memory.py # Score tracking, issue dedup, stagnation
│ ├── agents.py # Agent invocation
│ ├── execution.py # Agent execution and skill injection
│ ├── cli.py # CLI commands (ark new/run/status/...)
│ ├── compiler.py # LaTeX compilation
│ ├── citation.py # DBLP/CrossRef citation verification
│ ├── deep_research.py # Gemini Deep Research integration
│ ├── telegram.py # Telegram notifications + human intervention
│ ├── compute.py # Slurm/cloud compute backends
│ ├── templates/agents/ # Agent prompt templates
│ │ ├── initializer.prompt
│ │ ├── reviewer.prompt
│ │ ├── planner.prompt
│ │ ├── experimenter.prompt
│ │ ├── researcher.prompt
│ │ ├── writer.prompt
│ │ ├── visualizer.prompt
│ │ └── coder.prompt
│ └── webapp/
│ ├── app.py # Flask app
│ ├─�� db.py # SQLite models + state management
│ ├── jobs.py # Job launch, conda env provisioning
│ ├── routes.py # API routes + SSE
│ └── static/app.html # SPA frontend
├── skills/
│ ├── index.json # Skill registry
│ └── builtin/ # Built-in skills
│ ├── research-integrity/
│ ├── human-intervention/
│ ├── env-isolation/
│ ├── figure-integrity/
│ └── page-adjustment/
├── venue_templates/ # LaTeX templates per venue
├── tests/ # 115 tests
└── projects/ # Per-project directories (gitignored)
events.py— Event-driven system (replaced by Planner-based decisions)- Complex Memory tracking (issues, effective_actions, failed_attempts) — simplified