This file defines the role, scope, and minimum structure of registries in Invest-GT.
Registries are the governance layer for canonical repository artifacts.
They do not replace the artifacts themselves.
They make those artifacts findable, referencable, status-aware, and governable across time.
A repository with explicit artifacts but without registries remains partially structured.
A repository with explicit artifacts and explicit registries becomes much easier to audit, compare, maintain, and scale.
This file is canonical for registry logic.
As soon as a repository contains multiple hypotheses, scenarios, models, runs, reports, and validation records, several structural problems appear unless a registry layer exists.
Typical failure modes include:
- multiple artifacts exist, but no one knows which one is active
- artifacts can be found by file path, but not by stable identity
- superseded artifacts remain in use because status is not visible
- runs reference objects informally rather than canonically
- reports cannot be tied cleanly back to artifact histories
- validation status remains scattered across notes instead of becoming governable
- old artifacts are kept, but not distinguished from current ones
Registries exist to prevent this.
A registry is a structured index of canonical artifacts within a defined artifact family.
A registry should answer questions such as:
- Which artifacts of this type exist?
- What is each artifact's stable ID?
- What is its current status?
- Which version is current?
- Where is the canonical artifact file or record?
- What does it depend on?
- What supersedes what?
- What is safe to use downstream?
- What is deprecated, rejected, invalidated, or archived?
A registry is therefore not just a list.
It is a governance surface.
A registry is not:
- a replacement for the artifact content
- a free-form note collection
- a general wiki page
- an ungoverned directory listing
- a hidden internal implementation detail
The registry must remain inspectable and meaningful at the repository level.
Invest-GT should distinguish clearly between:
- the artifact itself
- the artifact's metadata
- the artifact's registry presence
These are related, but not identical.
For example:
- a hypothesis file contains the actual claim structure
- the metadata inside that file identifies the artifact
- the hypothesis registry records that the artifact exists, what its current status is, and how it relates to other artifacts
This separation matters because governance requires an overview across artifacts, not only inside individual files.
The repository should eventually maintain registries for the main artifact families.
Recommended canonical registries:
- world snapshots
- hypotheses
- scenarios
- agents
- models
- runs
- results
- reports
- validation records
- datasets or source packages where relevant
Not every registry needs to be equally elaborate at the beginning, but the structure should be anticipated now.
Recommended directory: registry/
This keeps governance artifacts distinct from:
docs/for conceptual definitionsdata/for source-like materialspipeline/for process structure- artifact family directories such as
hypotheses/,models/,scenarios/, orsimulations/
A good initial registry layout would be:
registry/world-snapshots.yamlregistry/hypotheses.yamlregistry/scenarios.yamlregistry/agents.yamlregistry/models.yamlregistry/runs.yamlregistry/results.yamlregistry/reports.yamlregistry/validations.yamlregistry/datasets.yaml
This list may evolve, but these are the main canonical candidates.
A top-level registry layer has several advantages.
A reviewer should not need to search the entire repository to know whether HYP-014 is active or superseded.
Runs, reports, and validations can point to canonical IDs that are also registry-visible.
Once artifact families grow, directory browsing alone becomes insufficient.
Registries make lifecycle, supersession, and status legible across time.
5. It reduces hidden state
Without a registry, status often lives only in filenames, local memory, or implicit convention.
A registry entry should be a compact but sufficient governance record for one artifact.
The artifact file remains the richer source.
The registry remains the canonical index.
A registry entry should not duplicate the entire artifact content.
It should capture the minimum necessary to govern the artifact reliably.
Every registry entry should ideally support at least:
artifact_idartifact_typetitlestatusversioncanonical_refcreated_atupdated_atcreated_bywhere appropriatesupersedeswhere relevantsuperseded_bywhere relevantdepends_onwhere relevanttagswhere useful
These fields may be extended by family-specific needs.
The stable identity of the artifact.
Example:
HYP-014
The artifact family.
Example:
hypothesisscenariomodel
A human-readable title.
The title can improve over time without replacing the artifact ID.
The current governance state of the artifact.
Example:
draftactivevalidatedsupersededrejected
The currently indexed version or revision of the artifact.
Example:
v2
A path, pointer, or internal reference to the canonical artifact location.
Example:
hypotheses/active/HYP-014-energy-infrastructure-reallocation.md
Lifecycle timestamps.
Explicit governance links when one artifact replaces another.
Artifact references that materially support this artifact.
Example:
- a hypothesis depending on
SNAP-004 - a run depending on
SCN-003,MOD-005, andHYP-014@v2
Different artifact families need different amounts of metadata.
The sections below describe the recommended role of each major registry.
registry/world-snapshots.yaml
Tracks the available world-state artifacts that may serve as contextual bases for hypotheses, scenarios, and runs.
Without a snapshot registry, it becomes difficult to know:
- which snapshot is current for a problem area
- which snapshots are archived
- which snapshots are invalidated
- what time references and scopes each snapshot covers
In addition to the common fields:
time_referencescopedomain_coveragesource_bundle_refsuncertainty_levelwhere useful
registry/hypotheses.yaml
Tracks all canonical hypothesis artifacts and their governance state.
The hypothesis registry is one of the most important registries in the repository.
It should make clear:
- which hypotheses are active
- which are in review
- which were rejected
- which were superseded
- which are linked to which scenarios or snapshots
claim_summaryconfidence_statusderived_fromlinked_snapshot_refslinked_scenario_refsvalidation_refs
registry/scenarios.yaml
Tracks the repository's structured scenarios and their current lifecycle state.
Scenarios may be reused across runs, revised over time, or archived when no longer current.
A scenario registry helps answer:
- which scenario should be used
- which scenario versions exist
- which hypotheses support the scenario
- whether the scenario is still active
time_horizonproblem_contextlinked_hypothesis_refslinked_agent_refslinked_model_refs
registry/agents.yaml
Tracks canonical agent specifications available for simulation or structured reasoning.
Without an agent registry, the same actor may end up represented in multiple drifting forms.
The registry makes visible:
- which agent specs exist
- which are active
- which abstraction level each agent uses
- which scenarios or models rely on them
agent_typeabstraction_levellinked_scenario_refslinked_model_refs
registry/models.yaml
Tracks model specifications and their governance status.
The model registry is central to simulation auditability.
It should make clear:
- which model is approved for what use
- whether a model is validated
- which model versions were used in which runs
- which models are deprecated or invalidated
model_purposemethod_familyinput_contract_refsoutput_contract_refsvalidation_refsfailure_mode_refs
registry/runs.yaml
Tracks all concrete execution instances.
Runs are where many artifacts are bound together into one execution context.
A run registry should make visible:
- what was executed
- when it ran
- whether it completed successfully
- which artifact versions were used
- where outputs are stored
run_purposesnapshot_refshypothesis_refsscenario_refagent_refsmodel_refsconfig_refexecution_statusresult_refsreport_refs
registry/results.yaml
Tracks the structured outputs of runs.
Results may be compared, reviewed, flagged, invalidated, or used in reports.
A result registry helps avoid confusion between:
- raw outputs
- accepted-for-review outputs
- invalidated outputs
- archived outputs
run_refresult_typewarning_refsanomaly_refsvalidation_refs
registry/reports.yaml
Tracks report bundles and their relation to upstream artifacts.
Reports are the communication surface of the repository.
Without a registry, it becomes difficult to know:
- which report is current
- what it was based on
- whether it is still valid
- whether a report has been revised or archived
report_typerun_refsresult_refsvalidation_refspublished_atwhere relevant
registry/validations.yaml
Tracks validation records and the artifacts they assess.
Validation should not remain scattered across comments and local notes.
The validation registry makes visible:
- what was checked
- against what criteria
- with what result
- whether unresolved issues remain
validation_typetarget_refscriteria_refoutcomereviewerissue_refs
registry/datasets.yaml
Tracks curated datasets, source bundles, or structured input packages used in the repository.
The input layer is often where provenance and reproducibility first weaken.
A dataset registry helps preserve:
- source package identity
- scope and date information
- usage across snapshots and runs
- licensing or usage caveats where relevant
dataset_typesource_originstime_coveragelicense_notesused_in_snapshot_refs
Registries should remain concise enough to govern, but detailed enough to matter.
A registry entry should not attempt to reproduce the entire artifact.
It should capture enough information to support:
- discovery
- status visibility
- dependency tracing
- lifecycle governance
- audit preparation
If a field is too detailed for the registry, it likely belongs in the artifact itself.
Whenever a canonical artifact is created, revised, superseded, invalidated, rejected, or archived, the relevant registry should be updated.
Registries should therefore evolve with artifact governance, not as an afterthought.
A useful rule is:
no major artifact lifecycle change without a corresponding registry update
This helps prevent hidden state.
In case of ambiguity between:
- an old filename
- a stale local note
- an untracked informal reference
- a registry entry
the registry should normally be treated as the canonical governance view.
This does not mean the registry replaces the artifact content.
It means the registry is the primary place to determine status and canonical identity.
Registries should not depend too heavily on folder names staying permanent forever.
A canonical_ref may point to a file path today, but the artifact's stable identity must remain primary.
This means:
- artifact ID is the core identity
- registry tracks where the canonical artifact currently lives
- file movement should not destroy governance continuity
Registries must work together with the identity and versioning logic defined in docs/ids-and-versioning.md.
A registry entry should make it clear:
- which artifact ID is being referenced
- which version is active or indexed
- whether newer or older versions exist
- whether the artifact was superseded, deprecated, or invalidated
Registries should not hide version transitions.
Registries are not the same as stage contracts.
docs/contracts.mddefines what stages may consume and produce- registries define which artifacts exist and in what governance state
Both are needed.
A stage may be allowed to consume hypotheses in principle.
The registry helps determine which specific hypotheses are active, valid, and available to consume.
Registries should not claim an artifact is validated unless a corresponding validation record exists or is explicitly referenced.
Validation-related status changes should remain evidence-linked.
This is especially important for:
- hypotheses
- models
- results
- reports
The registry should make this connection visible.
At the current maturity stage of the repository, registries should start simple.
A practical first version could use one YAML file per registry family with a list of entries.
This is sufficient to establish:
- canonical IDs
- current status
- canonical references
- basic dependency links
More complex indexing or automation can come later.
The important step now is to define the governance layer, not to overengineer it.
The exact schema may be defined later, but conceptually an entry might look like:
- artifact ID
- title
- version
- status
- canonical reference
- timestamps
- key dependency refs
- supersession fields
The implementation format is secondary to the governance clarity.
For every registry family, the following questions should be answerable:
- What artifact family does this registry govern?
- Which artifacts of that family currently exist?
- What is the stable ID of each artifact?
- What is the current status of each artifact?
- Which version is current?
- Where is the canonical artifact?
- What depends on it or what does it depend on?
- What has been superseded, rejected, invalidated, or archived?
If these questions cannot be answered, the registry is too weak.
This file implies a structural addition to the repository:
Recommended directory: registry/
This directory should be treated as the governance index layer of the repository.
It does not replace artifact directories such as:
hypotheses/scenarios/models/simulations/reports/
It complements them.
This file defines the role and required presence of registries.
It does not yet define:
- the final YAML or JSON schemas for each registry
- automated update mechanisms
- code-level registry APIs
- validation methodology
- storage tooling
- synchronization rules between registries and artifact-local metadata
Those should be specified separately.
This file is canonical for:
- the existence of registries as a governance layer
- the major registry families
- the role of registries in artifact lifecycle control
- the minimum governance expectations for registry entries
Any future governance system in this repository should either:
- implement this registry layer directly, or
- document explicitly why an equivalent mechanism is being used instead