Set up the Python environment using uv as the package manager:
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create a virtual environment and install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the package in editable mode
uv pip install -e ".[dev]"Install the package in editable mode for local development:
pip install -e .This will install all dependencies specified in pyproject.toml and allow you to modify the code without reinstalling.
The context_builder module generates JSON context files describing the structure and dimensions of knowledge graphs and ontologies.
Quick reference:
python -m omnigraph_agent.context_builder.cli build <graph_id>— build one graph (e.g.nde,mondo,vbo)python -m omnigraph_agent.context_builder.cli build <graph_id> --format yaml— YAML outputomnigraph-context-builder build-ubergraph-ontologies— build all Ubergraph ontologies from ontologies.txt (writes todist/context/)
A GitHub Action runs build-ubergraph-ontologies plus build nde, build mondo, build vbo weekly (Thursdays 2:00 AM PT) and opens a PR to update dist/context/ (see .github/workflows/update-context-weekly.yml).
Knowledge graphs contain instance data, entities, and relationships. Some knowledge graphs (like NDE) organize data into multiple repositories/catalogs.
Note: In NDE, a "repository" refers to a data catalog that aggregates datasets. Examples:
- ImmPort — immunological data repository
- Vivli — clinical research data repository
- Zenodo — general research data repository
- Project Tycho — epidemiological data repository
# Build all context files for a knowledge graph (JSON format, default)
python -m omnigraph_agent.context_builder.cli build nde
# Build context files in YAML format
python -m omnigraph_agent.context_builder.cli build nde --format yamlOutputs:
dist/context/nde_global.json(or.yaml) - Global graph contextdist/context/nde_<repository>.json(or.yaml) - Repository-specific context files (if the graph has repositories, e.g.,nde_immport.json)- For knowledge graphs without repositories, only the global context file is generated
Output Formats:
- JSON (default): Machine-readable format, backward compatible
- YAML: Human-readable format, better for integration with context packs and manual editing
Ontologies define class hierarchies, relationships, and axioms. They are single-source knowledge structures:
# Build context file for an ontology (JSON format)
python -m omnigraph_agent.context_builder.cli build vbo
# Build context file in YAML format (recommended for ontologies)
python -m omnigraph_agent.context_builder.cli build vbo --format yaml
# Build with OBO Foundry metadata (automatically fetched for OBO ontologies)
python -m omnigraph_agent.context_builder.cli build mondo --format yamlOutputs:
dist/context/vbo_global.json(or.yaml) - Global ontology context (no repository-specific files)- For OBO ontologies (GO, MONDO, HP, etc.), automatically includes:
- OBO Foundry metadata (description, homepage, license, domain)
- LLM-optimized fields (
good_for,query_patterns,connects_to)
To generate context files for every named graph in Ubergraph, use build-ubergraph. It discovers graphs via SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o . } }, keeps URIs under .../graphs/, and builds a minimal ontology config per graph_id (e.g. mondo, go, hp). Each graph produces only a global context file.
# Discover and build context for all Ubergraph /graphs/ (default limit 2000)
omnigraph-context-builder build-ubergraph
# Limit how many graphs to process (e.g. for a quick run)
omnigraph-context-builder build-ubergraph --limit 10
# Output YAML and persist generated configs under config/ for hand-editing
omnigraph-context-builder build-ubergraph --format yaml --write-configs
# By default, graph_ids that already have a config file (e.g. mondo, vbo) are skipped.
# Use --no-skip-existing-config to rebuild them using the generated minimal config.
omnigraph-context-builder build-ubergraph --no-skip-existing-configOptions:
--endpoint– SPARQL endpoint (default:https://ubergraph.apps.renci.org/sparql)--output-dir,-o– Output directory (default:dist/context)--format–jsonoryaml--limit– Max named graphs to discover and process (default: 2000)--skip-existing-config/--no-skip-existing-config– Skipgraph_ids that already have aconfig/{graph_id}.yaml(default: on)--write-configs– Write generated configs toconfig/for later editing
Note: Ubergraph may not expose .../graphs/ URIs via GRAPH ?g. If discovery finds no graphs, use build-ubergraph-ontologies instead.
Ubergraph’s ontologies.txt lists the OWL URLs it includes. Use build-ubergraph-ontologies to derive graph_ids from those URLs and build context files using FROM <http://ubergraph.apps.renci.org/graphs/{graph_id}>.
# Fetch ontologies.txt from GitHub and build context for each (default limit 2000)
omnigraph-context-builder build-ubergraph-ontologies
# Process first 5 and output YAML
omnigraph-context-builder build-ubergraph-ontologies --limit 5 --format yaml
# Use a local ontologies.txt
omnigraph-context-builder build-ubergraph-ontologies --ontologies-file ./ontologies.txt
# Custom URL and write configs to config/
omnigraph-context-builder build-ubergraph-ontologies --ontologies-url https://... --write-configsOptions:
--ontologies-url– URL of ontologies.txt (default: Ubergraph’s raw GitHub URL)--ontologies-file– Local file (overrides--ontologies-urlwhen set)--endpoint,-o,--format,--limit,--skip-existing-config,--write-configs– Same asbuild-ubergraph
Discover properties and structure of a graph/ontology:
# Introspect a graph to discover properties and generate suggested config
python -m omnigraph_agent.context_builder.cli introspect nde --output suggested_nde.yaml
# For ontologies, discover OBO annotations and relations
python -m omnigraph_agent.context_builder.cli introspect vbo --output suggested_vbo.yamlCreate a YAML config file in src/omnigraph_agent/context_builder/config/:
For Knowledge Graphs:
graph_id: "my_graph"
endpoint: "https://example.com/sparql"
repo_filter_property: "schema:includedInDataCatalog" # Property that links entities to repositories/catalogs
entity_types:
- "http://schema.org/Dataset" # Main entity types in your graph (e.g., Dataset, Person, Organization)
dimensions: [] # Will be populated via introspection
text_blurb: "Description of your knowledge graph"Configuration fields:
repo_filter_property: The RDF property that links entities to repositories/catalogs within the SPARQL endpoint (e.g.,schema:includedInDataCatalogfor NDE). The context builder uses this property to:- Discover repositories by querying for distinct values of this property
- Filter SPARQL queries to scope results to a specific repository when building repository-specific context files
- Example: In NDE,
schema:includedInDataCataloglinks Dataset entities to DataCatalog entities (ImmPort, Vivli, etc.)
entity_types: List of RDF types (full IRIs) that represent the main entities in your graph. These are used to filter queries and discover properties. Common examples:http://schema.org/Dataset,http://schema.org/Person,http://schema.org/Organization.
For Ontologies:
graph_id: "my_ontology"
endpoint: "https://ubergraph.apps.renci.org/sparql"
repo_filter_property: "rdf:type" # Placeholder - ontologies don't use repository filters
entity_types:
- "http://www.w3.org/2002/07/owl#Class" # Typically owl:Class for OBO ontologies
dimensions: [] # Will be populated via introspection
text_blurb: "Description of your ontology"Configuration fields:
repo_filter_property: Set to""(empty string) for ontologies. Ontologies don't have repositories, so this isn't used.entity_types: For OBO ontologies, typicallyhttp://www.w3.org/2002/07/owl#Class. This represents the ontology classes you want to query.
The system now includes generic handlers that work automatically! For most graphs, you can skip creating custom handler classes. The system will:
- Auto-detect your config file
- Use
GenericGraphfor knowledge graphs (whenrepo_filter_propertyis set) - Use
GenericOntologyGraphfor ontologies (whenrepo_filter_propertyis empty)
Simply run:
python -m omnigraph_agent.context_builder.cli build my_graphThe generic handlers will:
- Automatically detect repositories using the
repo_filter_propertyfrom your config - Work with any SPARQL endpoint
- Handle both knowledge graphs and ontologies
Custom Handlers (Optional)
If you need custom logic (e.g., special repository detection), you can still create a custom handler:
For Knowledge Graphs:
Create src/omnigraph_agent/context_builder/graphs/my_graph.py:
from pathlib import Path
from typing import List, Dict, Optional
from .knowledge_graph import KnowledgeGraph
class MyGraphGraph(KnowledgeGraph):
def __init__(self, config_path: Optional[Path] = None):
if config_path is None:
config_path = Path(__file__).parent.parent / "config" / "my_graph.yaml"
super().__init__(config_path=config_path)
def get_repositories(self) -> List[Dict[str, str]]:
# Custom repository detection logic
...Then register it in src/omnigraph_agent/context_builder/graphs/__init__.py:
from .my_graph import MyGraphGraph
GRAPH_HANDLERS = {
"nde": NDEGraph,
"vbo": VBOGraph,
"my_graph": MyGraphGraph, # Custom handler takes precedence
}python -m omnigraph_agent.context_builder.cli introspect my_graph --output suggested_my_graph.yamlReview and update the suggested config, then move it to the config directory.
python -m omnigraph_agent.context_builder.cli build my_graphThat's it! With the generic handler system, you only need to create a config file. The system will automatically discover and use the appropriate handler.
Context files include:
- Graph metadata:
graph_id,endpoint,entity_types - Dimensions: Properties with coverage, distinct values, top values
- Prefixes: Prefix to IRI mappings
- Query hints (for ontologies): Namespace scope, reasoning mode, query patterns
- OBO Foundry metadata (for OBO ontologies): Title, description, homepage, license, domain
- LLM-optimized fields (for ontologies):
good_for: List of use cases this ontology is good forquery_patterns: Common SPARQL query patterns with examplesconnects_to: Cross-ontology connections for multi-hop queriesdomains: Domain classifications
All context files are written to dist/context/ and can be consumed by:
- OmniGraph Agent for NL→SPARQL generation
- SPARQL Chrome extension for query generation
The bridge graph system generates cross-graph links between entities in different FRINK registry graphs using shared identifiers (e.g., GSE, NCT, MONDO, HGNC).
Bridge graphs enable querying across multiple knowledge graphs by:
- Discovering shared identifiers: Automatically finds identifier patterns (GSE, NCT, MONDO, etc.) that appear in multiple graphs
- Generating linksets: Creates RDF linksets that connect entities across graphs based on matching identifier values
- Providing bridge context: Generates JSON summaries of available cross-graph joins
Before generating bridge graphs, ensure you have context files for the graphs you want to link:
# Generate context files for all graphs you want to link
python -m omnigraph_agent.context_builder.cli build nde
python -m omnigraph_agent.context_builder.cli build mondo
# ... build contexts for other graphsGenerate linksets between two specific graphs:
# Generate linksets between NDE and another graph
python -m omnigraph_agent.context_builder.cli bridge generate nde bioprojectThis will:
- Find shared identifier patterns (e.g., GSE, MONDO, NCT)
- Query both graphs for entities with matching identifiers
- Generate RDF linkset files in
dist/bridge/linksets/
Outputs:
dist/bridge/linksets/nde__bioproject-gse.ttl- Linkset for GSE IDsdist/bridge/linksets/nde__bioproject-mondo.ttl- Linkset for MONDO IDs- etc.
Generate linksets for all graph pairs with shared join keys:
# Generate linksets for all pairs
python -m omnigraph_agent.context_builder.cli bridge generate-allThis automatically:
- Discovers all graph pairs that share identifier patterns
- Generates linksets only for pairs with actual shared keys
- Skips pairs without overlap
Discover which graph pairs can be linked (without generating):
# Discover pairs with shared join keys
python -m omnigraph_agent.context_builder.cli bridge discoverOutput:
Found 15 graph pair(s) with shared join keys:
nde → bioproject
Shared keys: GSE, MONDO, NCT
nde → spoke-okn
Shared keys: MONDO, HGNC
...
Generate JSON summaries of linksets for easy querying:
# Generate bridge context JSON files
python -m omnigraph_agent.context_builder.cli bridge generate-contextsOutputs:
dist/context/bridge/nde__bioproject.json- Bridge context for NDE → bioprojectdist/context/bridge/index.json- Global index of all bridges
Bridge context structure:
{
"source_graph": "https://frink.apps.renci.org/nde/sparql",
"target_graph": "https://frink.example.org/bioproject",
"source_graph_id": "nde",
"target_graph_id": "bioproject",
"linksets": [
{
"join_key_type": "GSE_ID",
"num_links": 10234,
"min_confidence": 1.0,
"max_confidence": 1.0,
"linkset_iri": "https://wobd.org/bridge/linkset/nde__bioproject-gse"
}
]
}List all available bridge contexts:
# List available bridges
python -m omnigraph_agent.context_builder.cli bridge listView detailed summary of links between two graphs:
# View summary of NDE → bioproject bridge
python -m omnigraph_agent.context_builder.cli bridge summary nde bioprojectBridge graphs enable federated queries across multiple graphs. Example:
PREFIX wobd-bridge: <https://wobd.org/bridge#>
PREFIX void: <http://rdfs.org/ns/void#>
# Find all NDE datasets linked to bioproject entities via GSE IDs
SELECT ?nde_dataset ?bioproject_entity ?gse_id
WHERE {
# Query NDE graph
SERVICE <https://frink.apps.renci.org/nde/sparql> {
?nde_dataset a schema:Dataset .
?nde_dataset schema:identifier ?gse_id .
FILTER (STRSTARTS(STR(?gse_id), "GSE"))
}
# Use bridge graph to find matching bioproject entity
?link wobd-bridge:sourceNode ?nde_dataset ;
wobd-bridge:targetNode ?bioproject_entity ;
wobd-bridge:joinKeyValue ?gse_id ;
wobd-bridge:joinKeyType wobd-bridge:GSE_ID .
# Query bioproject graph
SERVICE <https://frink.example.org/bioproject> {
?bioproject_entity a biolink:Bioproject .
}
}The system automatically recognizes these identifier patterns:
- GSE_ID: GEO Series identifiers (GSE12345)
- NCT_ID: ClinicalTrials.gov identifiers (NCT00012345)
- MONDO_ID: MONDO disease ontology (MONDO:0000001)
- HGNC_ID: HGNC gene identifiers (HGNC:1234)
- GO_ID: Gene Ontology (GO:0008150)
- DOID_ID: Disease Ontology (DOID:4)
- HP_ID: Human Phenotype Ontology (HP:0000001)
- CHEBI_ID: ChEBI (CHEBI:15365)
- UniProtKB_ID: UniProt Knowledgebase (UniProtKB:P12345)
- PMID_ID: PubMed (PMID:12345678)
- PMC_ID: PubMed Central (PMC123456)
Additional patterns are automatically discovered from context files.
The bridge system includes a semantic mapping layer that understands when different identifier prefixes refer to the same semantic concept:
Taxonomies: NCBITaxon, UniProtKB (taxon), ITIS, GBIF
- Example: A graph using
NCBITaxon:9606(human) can link to another usingUniProtKBtaxon IDs
Genes: HGNC, Ensembl, MGI, ZFIN, FlyBase, WormBase, RGD
- Example: A graph using
HGNC:1100can link to another usingEnsembl:ENSG00000139618
Diseases: MONDO, DOID, OMIM, Orphanet
- Example: A graph using
MONDO:0000001can link to another usingDOID:4
Publications: PMID, PMC, DOI
- Example: A graph using
PMID:12345678can link to another usingPMC:123456
When generating linksets, the system will:
- Exact matches: Link entities with identical identifier prefixes (e.g., both use GSE)
- Semantic matches: Link entities with semantically related identifiers (e.g., NCBITaxon ↔ UniProtKB taxon)
This enables the NL→SPARQL system to understand that queries about "taxonomies" or "species" can work across graphs even when they use different identifier systems.
The bridge system supports all graphs in the FRINK registry, including:
- NDE (NIAID Data Ecosystem)
- protookn graphs (biobricks-*, biohealth, spoke-okn, etc.)
- Ontologies (mondo, vbo, ubergraph, etc.)
See FRINK Registry for the complete list.
The context builder supports both JSON and YAML output formats. YAML is recommended for ontologies as it's more human-readable and easier to integrate with context packs.
# Generate YAML context
python -m omnigraph_agent.context_builder.cli build go --format yamlFor OBO ontologies (GO, MONDO, HP, NCBITaxon, etc.), the system automatically:
- Fetches metadata from the OBO Foundry registry
- Caches registry data locally (refreshed weekly)
- Merges OBO metadata with introspection results
OBO Foundry metadata includes:
- Full ontology name and description
- Homepage and license information
- Domain classification
- Imported ontologies
- Available file formats
The context builder automatically enriches ontology contexts with fields optimized for LLM-powered multi-hop query planning:
good_for: List of use cases the ontology is useful for
- Example:
["Disease name resolution", "Disease classification", "Cross-referencing to clinical databases"]
query_patterns: Common SPARQL query patterns with examples
- Includes patterns for hierarchy traversal, label lookup, cross-references
- Each pattern includes a SPARQL hint and usage notes
connects_to: Cross-ontology connections for multi-hop queries
- Documents how ontologies link together (e.g., GO → MONDO via genes)
- Includes example queries and typical hop counts
- Helps LLMs route queries across multiple graphs
domains: Domain classifications
- Example:
["biological_process", "molecular_function", "cellular_component"]
These fields enable intelligent query routing in systems like OKN-WOBD that use GPT-4o for multi-hop query planning.
The project includes a test suite to verify the bridge graph system functionality and fixes. Tests are located in the tests/ directory and use pytest.
First, install the development dependencies:
# Install dev dependencies (includes pytest)
pip install -e ".[dev]"
# or with uv
uv pip install -e ".[dev]"Then run the tests:
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run a specific test file
pytest tests/test_bridge_fixes.py
# Run a specific test class
pytest tests/test_bridge_fixes.py::TestPMCPattern
# Run a specific test
pytest tests/test_bridge_fixes.py::TestPMCPattern::test_pmc_pattern_matches_without_colon
# Skip integration tests (require network: Ubergraph, OBO Foundry)
pytest -m "not integration"
# Run only integration tests (build mondo --format yaml, check good_for/connects_to)
pytest -m integrationThe test suite includes:
- PMC Pattern Tests: Verifies PMC ID regex patterns work correctly
- Semantic Category Tests: Verifies identifier semantic categorization (e.g., UniProtKB as TAXONOMY)
- Regex Pattern Tests: Verifies entity IRI regex handles special characters
- Namespace Tests: Verifies proper RDF namespace usage in bridge context generator
- Link Count Tests: Verifies accurate link counting using RDF queries
- Identifier Pattern Extraction Tests: Verifies pattern extraction from identifier values
- YAMLContextWriter Tests: Round-trip and Pydantic model writing
- LLMContextEnricher Tests:
infer_good_for,extract_query_patterns,discover_connections,enrich_context - OBOfoundryClient Tests: Metadata lookup, caching,
_normalize_ontology_id(mocked registry) - Integration (marker
integration): Fullbuild mondo --format yamlwithgood_forandconnects_tochecks; requires network