Skip to content

BacDive transformer opens ncbitaxon.owl with a sqlite adapter (path/format mismatch) #544

@turbomam

Description

@turbomam

Symptom

BacDive transformer fails when opening the NCBI taxonomy resource — get_adapter is called with sqlite:…ncbitaxon.owl.

Cause

# constants.py
NCBITAXON_SOURCE = RAW_DATA_DIR / "ncbitaxon.owl"

# kg_microbe/transform_utils/bacdive/bacdive.py
self.ncbi_impl = get_adapter(f"sqlite:{NCBITAXON_SOURCE}")  # sqlite: + .owl — wrong

The sqlite adapter expects a .db file; the ncbitaxon.db needed here exists under data/raw/ (built via semsql from the OWL).

Recommended fix

self.ncbi_impl = get_adapter(f"sqlite:{RAW_DATA_DIR}/ncbitaxon.db")

Single-line change in the BacDive transformer. Aligns with the already-built local ncbitaxon.db.

Alternative: add a Makefile rule to build `ncbitaxon.db` from `ncbitaxon.owl` via semsql/runoak

Shape: data/raw/ncbitaxon.db: data/raw/ncbitaxon.owl. Generalizes for fresh clones that don't already have the DB built. More work; value depends on whether the DB is typically pre-provisioned.

Alternative: split into separate OWL and DB constants in constants.py

Reduces ambiguity but requires updating all callsites that use NCBITAXON_SOURCE.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions