Skip to content

Commit caf4f7b

Browse files
RESEARCH.md and improvements
1 parent 373073d commit caf4f7b

24 files changed

Lines changed: 1771 additions & 9 deletions

.github/workflows/tests.yml

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232
strategy:
3333
fail-fast: false
3434
matrix:
35-
python-version: ["3.10", "3.11", "3.12"]
35+
python-version: ["3.10", "3.11", "3.12", "3.13"]
3636

3737
env:
3838
PYTHONUTF8: "1"
@@ -100,3 +100,39 @@ jobs:
100100
- name: Run full test suite
101101
run: |
102102
pytest ${{ inputs['pytest-args'] || '-q -o addopts=' }}
103+
104+
numpy2-compat:
105+
name: NumPy 2 compatibility probe
106+
runs-on: ubuntu-latest
107+
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
108+
continue-on-error: true
109+
110+
env:
111+
PYTHONUTF8: "1"
112+
113+
steps:
114+
- name: Checkout repository
115+
uses: actions/checkout@v4
116+
117+
- name: Set up Python
118+
uses: actions/setup-python@v5
119+
with:
120+
python-version: "3.13"
121+
cache: "pip"
122+
cache-dependency-path: |
123+
pyproject.toml
124+
125+
- name: Install with NumPy 2 override
126+
run: |
127+
python -m pip install --upgrade pip
128+
python -m pip install -e ".[dev]" pyscf
129+
python -m pip install --upgrade --no-deps "numpy>=2,<3" "scipy>=1.17,<2"
130+
131+
- name: Debug environment
132+
run: |
133+
python --version
134+
python -m pip freeze | sed -n '1,200p'
135+
136+
- name: Run fast tests against NumPy 2
137+
run: |
138+
pytest -q

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
include README.md
33
include USAGE.md
44
include THEORY.md
5+
include RESEARCH.md
56
include CHANGELOG.md
67
include LICENSE
78
include pyproject.toml

README.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -370,7 +370,8 @@ Use these in order:
370370
1. [`README.md`](README.md) for orientation and quickstart
371371
2. [`USAGE.md`](USAGE.md) for CLI and Python entrypoints
372372
3. [`THEORY.md`](THEORY.md) for algorithmic background
373-
4. [`notebooks/README_notebooks.md`](notebooks/README_notebooks.md) for notebook navigation
373+
4. [`RESEARCH.md`](RESEARCH.md) for benchmark evidence standards
374+
5. [`notebooks/README_notebooks.md`](notebooks/README_notebooks.md) for notebook navigation
374375

375376
Deeper implementation notes:
376377

@@ -413,6 +414,24 @@ Run the full suite, including slow integration coverage, with:
413414
pytest -q -o addopts=''
414415
```
415416

417+
Run a registered benchmark suite and write reproducible artifacts with:
418+
419+
```bash
420+
python -m common.benchmarks run --suite expert-z-cross-method --out benchmark_runs
421+
```
422+
423+
List available suites with:
424+
425+
```bash
426+
python -m common.benchmarks list
427+
```
428+
429+
Compare two benchmark runs with:
430+
431+
```bash
432+
python -m common.benchmarks compare --base old_run/h2-cross-method --head new_run/h2-cross-method
433+
```
434+
416435
---
417436

418437
## Support development

RESEARCH.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Research Use
2+
3+
This repository is most useful as a reproducible evidence generator for
4+
small-system quantum simulation studies. It is not a claim that any one method
5+
is universally best, and it is not production chemistry software.
6+
7+
For the problem scope and user-facing workflows, read `PROBLEM.md`. For the
8+
current benchmark inventory, read `notebooks/benchmarks/SUMMARY.md`. For
9+
published tables and figures, read `notebooks/benchmarks/RESULTS.md`.
10+
11+
## Research Claims This Repo Can Support
12+
13+
Use this repo to support claims of these forms:
14+
15+
- for a named small molecule or low-qubit Hamiltonian, method A was more
16+
accurate, faster, or more stable than method B under a stated configuration
17+
- a solver default is reasonable for a documented calibration panel
18+
- a configuration is sensitive to seed, shots, noise channel, mapping, ansatz,
19+
optimizer, or active-space choice
20+
- a non-chemistry Hamiltonian can be run through the same expert-mode API as the
21+
chemistry benchmarks
22+
23+
Do not use this repo, by itself, to claim:
24+
25+
- chemical accuracy for large molecules
26+
- hardware performance or device-readiness
27+
- universal algorithm rankings across problem classes
28+
- production-quality quantum chemistry results
29+
30+
## Evidence Standard
31+
32+
A result should be treated as research evidence only when it records:
33+
34+
- the resolved problem: molecule or model, geometry, charge, basis, mapping,
35+
active space, qubit count, and Hamiltonian-term count where available
36+
- the reference: exact diagonalization, Hartree-Fock, analytical model result,
37+
or an explicit statement that no reference is used
38+
- the solver configuration: method, ansatz, optimizer, step counts, stepsizes,
39+
QPE evolution settings, shots, seeds, and noise model
40+
- the metrics: energy, absolute error against the reference where meaningful,
41+
runtime, cache-hit state, and method-specific diagnostics
42+
- the statistical design: seed list, shot list, repetitions, failure criteria,
43+
and aggregation method for stochastic or optimizer-sensitive studies
44+
- the environment: package version and relevant dependency versions for
45+
release-grade benchmark artifacts
46+
47+
The benchmark row contract is documented in
48+
`notebooks/benchmarks/SCHEMA.md`.
49+
50+
## Claim Levels
51+
52+
| Level | Meaning | Minimum evidence |
53+
| --- | --- | --- |
54+
| Smoke | The API path runs. | One tiny deterministic case. |
55+
| Case study | A method behaves as reported on one problem. | One problem, fixed config, reference where available. |
56+
| Benchmark | A comparison is decision-useful. | Multiple methods or settings, common reference, runtime/cache metadata, documented metrics. |
57+
| Reproducibility study | Stability is measured. | Multiple seeds or shots, aggregate statistics, failure-rate notes. |
58+
| Release-grade evidence | A result can be cited. | Curated artifact export, versioned code, clean validation checks, and documented limitations. |
59+
60+
## Benchmark Acceptance Checklist
61+
62+
Before treating a new notebook as a benchmark, verify that it:
63+
64+
- asks one explicit research or method-selection question
65+
- avoids duplicating an existing notebook unless it replaces or generalizes it
66+
- follows `notebooks/benchmarks/TEMPLATE.md` for question, scope, reference,
67+
metrics, aggregation, limitations, and artifact-export notes
68+
- uses the shared problem-resolution and Hamiltonian pipeline
69+
- compares against an exact or clearly documented reference when feasible
70+
- reports cache hits separately from compute runtime
71+
- exports any published table or figure through
72+
`scripts/export_benchmark_artifacts.py`
73+
- states limitations in the notebook or nearby docs when a method is known to
74+
be noiseless-only, calibration-specific, or small-system-only
75+
76+
## Release Protocol
77+
78+
For a release intended to be useful in research:
79+
80+
1. Run the default and full test suites.
81+
2. Run registered benchmark suites that should become release evidence. Use
82+
`python -m common.benchmarks list` to inspect available suites, then run a
83+
selected suite, for example
84+
`python -m common.benchmarks run --suite h2-cross-method`.
85+
3. Compare refreshed suite outputs against the previous evidence set with
86+
`python -m common.benchmarks compare --base old_run --head new_run`.
87+
4. Rerun benchmark notebooks whose results are being refreshed.
88+
5. Export curated artifacts with `python scripts/export_benchmark_artifacts.py`.
89+
6. Confirm `notebooks/benchmarks/_artifacts/benchmark_manifest.json` describes
90+
the published artifact set.
91+
7. Build docs with `python -m sphinx -W -b html docs docs/_build/html`.
92+
8. Tag the code and attach or archive the curated artifact set if the release is
93+
meant to be cited directly.
94+
95+
## Document Boundaries
96+
97+
The markdown files intentionally have separate jobs:
98+
99+
- `README.md`: installation, orientation, and quickstart
100+
- `PROBLEM.md`: what practical problems the repo is for
101+
- `THEORY.md`: algorithm background
102+
- `USAGE.md`: API and CLI usage
103+
- `RESEARCH.md`: evidence standards and benchmark acceptance rules
104+
- `notebooks/benchmarks/SUMMARY.md`: benchmark inventory
105+
- `notebooks/benchmarks/RESULTS.md`: curated result surfaces
106+
- `notebooks/benchmarks/SCHEMA.md`: benchmark row and artifact metadata fields

common/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,14 @@
4747
),
4848
"ionization_energy_panel": ("common.benchmarks", "ionization_energy_panel"),
4949
"summarize_problem": ("common.benchmarks", "summarize_problem"),
50+
"compare_benchmark_runs": ("common.benchmarks", "compare_benchmark_runs"),
51+
"list_benchmark_suites": ("common.benchmarks", "list_benchmark_suites"),
52+
"run_benchmark_suite": ("common.benchmarks", "run_benchmark_suite"),
53+
"environment_metadata": ("common.environment", "environment_metadata"),
54+
"ensure_environment_metadata": (
55+
"common.environment",
56+
"ensure_environment_metadata",
57+
),
5058
"compute_fidelity": ("common.metrics", "compute_fidelity"),
5159
"ResolvedProblem": ("common.problem", "ResolvedProblem"),
5260
"resolve_problem": ("common.problem", "resolve_problem"),

0 commit comments

Comments
 (0)