Skip to content

Latest commit

 

History

History
106 lines (84 loc) · 5.01 KB

File metadata and controls

106 lines (84 loc) · 5.01 KB

Research Use

This repository is most useful as a reproducible evidence generator for small-system quantum simulation studies. It is not a claim that any one method is universally best, and it is not production chemistry software.

For the problem scope and user-facing workflows, read PROBLEM.md. For the current benchmark inventory, read notebooks/benchmarks/SUMMARY.md. For published tables and figures, read notebooks/benchmarks/RESULTS.md.

Research Claims This Repo Can Support

Use this repo to support claims of these forms:

  • for a named small molecule or low-qubit Hamiltonian, method A was more accurate, faster, or more stable than method B under a stated configuration
  • a solver default is reasonable for a documented calibration panel
  • a configuration is sensitive to seed, shots, noise channel, mapping, ansatz, optimizer, or active-space choice
  • a non-chemistry Hamiltonian can be run through the same expert-mode API as the chemistry benchmarks

Do not use this repo, by itself, to claim:

  • chemical accuracy for large molecules
  • hardware performance or device-readiness
  • universal algorithm rankings across problem classes
  • production-quality quantum chemistry results

Evidence Standard

A result should be treated as research evidence only when it records:

  • the resolved problem: molecule or model, geometry, charge, basis, mapping, active space, qubit count, and Hamiltonian-term count where available
  • the reference: exact diagonalization, Hartree-Fock, analytical model result, or an explicit statement that no reference is used
  • the solver configuration: method, ansatz, optimizer, step counts, stepsizes, QPE evolution settings, shots, seeds, and noise model
  • the metrics: energy, absolute error against the reference where meaningful, runtime, cache-hit state, and method-specific diagnostics
  • the statistical design: seed list, shot list, repetitions, failure criteria, and aggregation method for stochastic or optimizer-sensitive studies
  • the environment: package version and relevant dependency versions for release-grade benchmark artifacts

The benchmark row contract is documented in notebooks/benchmarks/SCHEMA.md.

Claim Levels

Level Meaning Minimum evidence
Smoke The API path runs. One tiny deterministic case.
Case study A method behaves as reported on one problem. One problem, fixed config, reference where available.
Benchmark A comparison is decision-useful. Multiple methods or settings, common reference, runtime/cache metadata, documented metrics.
Reproducibility study Stability is measured. Multiple seeds or shots, aggregate statistics, failure-rate notes.
Release-grade evidence A result can be cited. Curated artifact export, versioned code, clean validation checks, and documented limitations.

Benchmark Acceptance Checklist

Before treating a new notebook as a benchmark, verify that it:

  • asks one explicit research or method-selection question
  • avoids duplicating an existing notebook unless it replaces or generalizes it
  • follows notebooks/benchmarks/TEMPLATE.md for question, scope, reference, metrics, aggregation, limitations, and artifact-export notes
  • uses the shared problem-resolution and Hamiltonian pipeline
  • compares against an exact or clearly documented reference when feasible
  • reports cache hits separately from compute runtime
  • exports any published table or figure through scripts/export_benchmark_artifacts.py
  • states limitations in the notebook or nearby docs when a method is known to be noiseless-only, calibration-specific, or small-system-only

Release Protocol

For a release intended to be useful in research:

  1. Run the default and full test suites.
  2. Run registered benchmark suites that should become release evidence. Use python -m common.benchmarks list to inspect available suites, then run a selected suite, for example python -m common.benchmarks run --suite h2-cross-method.
  3. Compare refreshed suite outputs against the previous evidence set with python -m common.benchmarks compare --base old_run --head new_run.
  4. Rerun benchmark notebooks whose results are being refreshed.
  5. Export curated artifacts with python scripts/export_benchmark_artifacts.py.
  6. Confirm notebooks/benchmarks/_artifacts/benchmark_manifest.json describes the published artifact set.
  7. Build docs with python -m sphinx -W -b html docs docs/_build/html.
  8. Tag the code and attach or archive the curated artifact set if the release is meant to be cited directly.

Document Boundaries

The markdown files intentionally have separate jobs:

  • README.md: installation, orientation, and quickstart
  • PROBLEM.md: what practical problems the repo is for
  • THEORY.md: algorithm background
  • USAGE.md: API and CLI usage
  • RESEARCH.md: evidence standards and benchmark acceptance rules
  • notebooks/benchmarks/SUMMARY.md: benchmark inventory
  • notebooks/benchmarks/RESULTS.md: curated result surfaces
  • notebooks/benchmarks/SCHEMA.md: benchmark row and artifact metadata fields