AHT: Automatic Hyperparameter Tuning with Coding Agents

Tell the agent what to optimize. It reads your project, plans a strategy, runs experiments, and learns from each result — when you are enjoying your coffee.

中文版 Readme

TL;DR: AHT is a skill that turns a coding agent (e.g., OpenClaw, Claude Code, and OpenAI Codex) into an autonomous hyperparameter tuning researcher for any deep learning project built on Hydra.

Hyperparameter tuning remains one of the most tedious bottlenecks in deep learning research. Traditional search methods — grid search, random search, and Bayesian optimizers like Optuna — treat the hyperparameter space as a black box: they sample configurations, evaluate metrics, and repeat, without ever reading a line of code or understanding why a learning rate of 1e-3 works better than 1e-2. Researchers, on the other hand, bring intuition — they read the model, inspect loss curves, and reason about what to try next. But that intuition is expensive: it demands hours of manual intervention and context-switching between experiments.

AHT bridges this gap. It teaches a coding agent to tune hyperparameters the way a researcher would — by reading the project first, then reasoning about what to change — while inheriting the tirelessness of automated search: it runs overnight, manages its own experiment queue, and wakes itself up when a training job finishes.

Overview

AHT takes a fundamentally different approach. Instead of blind search, it equips a coding agent with the tools to understand the project first, then reason about what to change next:

Read — The agent walks through the codebase, parses the Hydra config hierarchy, and produces structured documentation (PROJECT.md, HPARAM.md) that captures the model architecture, training pipeline, and tunable knobs.
Plan — Before any experiment runs, the agent drafts a tuning strategy: which hyperparameters to prioritize, what ranges make sense given the architecture, and what patterns to watch for.
Run — Training commands are launched asynchronously in detached tmux sessions (locally or over SSH). The agent polls for completion, estimates ETAs, and uses cron reminders to wake itself up — no human babysitting required.
Analyze — After each run, TensorBoard event files are parsed into structured scalar summaries. The agent detects divergence, plateaus, and overfitting, then logs its findings in a cumulative report.
Learn — Each subsequent tuning decision is informed by the full run history: past overrides, metric trends, and the agent's own analysis. This closed loop lets the agent refine its strategy over time rather than exploring blindly.

The result is an iterative, context-aware tuning process that combines the rigor of systematic experimentation with the intuition of an experienced researcher — running autonomously from the first experiment to the final report.

Compare to autonomous research / tuning approaches

Compare to existing autoresearch-like approaches, AHT occupies a very specific point in the design space: skill-form, Hydra-native, low-intrusion, and tuning-focused:

Repo	Scope	As a skill	Platform support	intrusiveness to existing workflows
uditgoenka/autoresearch	general optimization / autonomous iteration	✅	Claude Code	High
ARIS ⚔️	ML research workflows	✅	Claude Code / Codex / OpenClaw / any LLM agent	Medium
aiming-lab/AutoResearchClaw	full autonomous research (idea → paper)	❌	OpenClaw / Claude Code / CLI	High
HKUDS/ClawTeam	multi-agent orchestration for autonomous experiments	✅	Claude Code / Codex / OpenClaw / nanobot / Cursor / custom CLI agents	Medium
karpathy/autoresearch	autonomous ML experimentation on a small LLM training repo	❌	-	It's an independent project
facebookresearch/how-to-autorl	RL hparam tuning	❌	Hydra	Low
AHT	hparam tuning for Hydra projects	✅	Claude Code, OpenClaw	Low

✨ Features

Project and config understanding

AHT walks through the target project to identify the entry script, Hydra config structure, and tunable hyperparameters, producing PROJECT.md and HPARAM.md as structured references for subsequent tuning decisions.

TensorBoard event analysis

AHT exposes TensorBoard scalar data to the agent, allowing it to detect training patterns such as divergence, plateaus, and overfitting from logged metrics.

Context-aware tuning with run histories

In each tuning iteration, AHT spawns a subagent with the project overview, historical overrides, tuning strategy, and accumulated results as context, allowing it to learn from past runs and make informed decisions for the next override.

Async execution with tmux

Training runs are launched in detached tmux sessions (both locally and over SSH), enabling the agent to poll for completion, estimate ETAs, and set cron reminders instead of blocking.

Experiment history and reporting

AHT maintains a structured session directory (aht/yyyy-mm-dd/hh-mm-ss/) with per-run configs, metrics, and analysis. A built-in reporting script can generate summary, Markdown, or HTML reports comparing runs.

🔄 Workflow

Understand the project — Inspect the project structure and Hydra config hierarchy; generate PROJECT.md and HPARAM.md if missing.
Understand the run command — Analyze the user-provided training command to identify active configs, output paths, metric candidates, and relevant hyperparameters.
Create a session — Initialize a tuning session with the base command, primary metric, and optimization goal; auto-insert - override into the Hydra defaults list.
Tuning loop (baseline run + up to N iterations):
1. Spawn a subagent to decide the best override based on the strategy and run history.
2. Launch the run in a detached tmux session.
3. Poll the run status; set a cron reminder if still running.
4. Once finished, spawn a subagent to analyze the TensorBoard event file and update the report.
Finalize — Present the final report and best configuration to the user.

🚀 Quick Start

Claude Code

Clone the repo and create symlinks to claude code skills directory:

git clone https://github.com/zxh0916/auto-hparam-tuning.git
cd auto-hparam-tuning
pip install -r requirements.txt
# global installation, create symlinks in ~/.claude/skills
bash install_claudecode.sh
# or project installation, create symlinks in /path/to/project/.claude/skills
bash install_claudecode.sh /path/to/project

OpenClaw

Clone the repo into your global skill directory and install the dependencies:

cd ~/.openclaw/skills
git clone https://github.com/zxh0916/auto-hparam-tuning.git
pip install -r auto-hparam-tuning/requirements.txt

Modify your OpenClaw config:

{
  "skills": {
    "load": {
      "extraDirs": [
        "~/.openclaw/skills/auto-hparam-tuning/skills"
      ]
    },
    "entries": {
      "auto-hparam-tuning": { "enabled": true },
      "aht-init": { "enabled": true }
    }
  }
}

Usage

/auto-hparam-tuning Please tune the project "/path/to/project" in "some_remote_machine", use remote conda environment "some_remote_conda_env" and local conda environment "some_local_conda_env".

Use Different Models for Subagents

You can specify different models for hparam tuning and result analyzation by setting environment variables in openclaw.json:

{
  "env": {
    "AHT_TUNING_MODEL": "minimax/minimax-m2.5",
    "AHT_ANALYZE_MODEL": "moonshot/kimi-k2.5"
  }
}

Leaving these values unset means using the agent's default model (agents.list[].model.primary).

📝 TODO List

🤗 Citing

If you find this project useful in your research, please cite Hydra and AHT using the following BibTeX entries:

@Misc{Zhang2026AHT,
  author =       {Xinhong Zhang, Weipu Zhang, Haolin Chen},
  title =        {AHT: Automatic Hyperparameter Tuning with Coding Agents using Hydra},
  howpublished = {Github},
  year =         {2026},
  url =          {https://github.com/zxh0916/auto-hparam-tuning}
}

@Misc{Yadan2019Hydra,
  author =       {Omry Yadan},
  title =        {Hydra - A framework for elegantly configuring complex applications},
  howpublished = {Github},
  year =         {2019},
  url =          {https://github.com/facebookresearch/hydra}
}

If you have any question, feel free to create an issue or join the wechat group:

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
agents		agents
hydra_docs		hydra_docs
imgs		imgs
skills		skills
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
install_claudecode.sh		install_claudecode.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AHT: Automatic Hyperparameter Tuning with Coding Agents

Overview

Compare to autonomous research / tuning approaches

✨ Features

Project and config understanding

TensorBoard event analysis

Context-aware tuning with run histories

Async execution with tmux

Experiment history and reporting

🔄 Workflow

🚀 Quick Start

Claude Code

OpenClaw

Usage

Use Different Models for Subagents

📝 TODO List

🤗 Citing

Star History

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AHT: Automatic Hyperparameter Tuning with Coding Agents

Overview

Compare to autonomous research / tuning approaches

✨ Features

Project and config understanding

TensorBoard event analysis

Context-aware tuning with run histories

Async execution with tmux

Experiment history and reporting

🔄 Workflow

🚀 Quick Start

Claude Code

OpenClaw

Usage

Use Different Models for Subagents

📝 TODO List

🤗 Citing

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages