Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions src/scripts/migrate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# COB Migration

This folder contains some scripts to help migrate you OBO ontology to COB.
There is a table of terms to migrate,
and a script to convert that table into instructions for
[ROBOT](https://robot.obolibrary.org).
Using ROBOT you can

1. get a list of the relevant terms in your ontology
2. remove them from your ontology
3. replace them with either strict or suggested replacement terms

These tools are meant to help migrate to COB,
but manual review is required to ensure that no mistakes are made.

## Basics

The relevant terms are listed in `migrate.tsv`.
There are columns for

- the target term and its label
- a comment about what should be done
- the more general replacement term and its label
- the more specific suggested term and its label

Replacing a term with the more general term should always be safe.
Replacing a term with the more specific term
is the better choice in most but not all cases.
If the replacement is 'owl:Thing',
then it is better to completely remove that term
or that part of an axiom.

When the `migrate.tsv` table changes,
run the `generate.py` script to update
the SPARQL query and ROBOT instruction files.

## 1. Report

The first step is to report on which terms, if any,
need to be changed.
The `migrate.tsv` file lists them,
and the `report.rq` SPARQL query will search for them in your ontology.

It's best to run this query on your ontology's base file,
so that it only considers the terms and axioms
that your project is responsible for.
If you run the query on your ontology's full release file,
you may find problems with upstream ontologies.
In this case, the best approach is to update your import strategy
so that it is compatible with your new use of COB.

```sh
robot query --input your-base.owl --query report.rq report.tsv
```

## 2. Remove

ROBOT can help migrate to COB by removing the terms listed in `migrate.tsv`
from your OWL files.
The best approach is to remove the terms from your project's "edit",
e.g. `obi-edit.owl`,
and the ROBOT command below will help with that.
However, the ROBOT command is not aware of any templates or import definitions.
If you use the terms in `migrate.tsv` anywhere in your templates or imports,
then you will have to make those changes manually,
guided by the report.

Another option is to run the ROBOT commands on your project's release files,
including the full release and base file.
This is useful for testing or as part of a migration strategy,
where you inform your users of the upcoming changes.
However the best long-term solution is to change your source files.

We use ROBOT to remove the terms in `migrate.tsv` from an OWL file in two steps.
First we remove all the annotations on those terms,
then we remove subclass and equivalent class axioms.
The only remaining references to those terms
will be inside your other axioms.
In the next step we replace the remaining term references with the new terms.

```sh
robot remove \
--input target.owl \
--term-file remove-annotations.txt \
--axioms "annotation" \
--preserve-structure false \
remove \
--term-file remove-axioms.txt \
--axioms "subclass equivalent disjoint" \
--signature true \
--trim false \
--preserve-structure false \
--output removed.owl \
```

## 3. Replace

The final step is to replace the migrated terms.
You have two choices:

1. `strict-replacement.txt` migrates to more general terms, which should always be safe
2. `suggested-replacement.txt` migrates to more specific terms, which are better replacements in most but not all cases

The best option is to use the migration report to guide you,
and manually review your ontology
to see if all the relevant suggested replacements will work.
If any suggestions do not work, then use the strict replacement instead.
Write a new `replacement.txt` file with your preferences,
then run the ROBOT command:

```sh
robot rename \
--input removed.owl \
--mappings strict-replacement.tsv \
--allow-missing-entities true \
--allow-duplicates true \
--output renamed.owl \
```

## Next Steps

If you run the report query again after the remove and replace steps,
the result report should be empty.
Make sure to manually review changes.
The [`robot diff`](https://robot.obolibrary.org/diff)
will help to compare your OWL files before-and-after.

If you have questions or concerns,
feel free to open an issue here in the COB repository.

84 changes: 84 additions & 0 deletions src/scripts/migrate/generate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import csv


def generate_report(path, rows):
lines = [
"PREFIX owl: <http://www.w3.org/2002/07/owl#>",
"PREFIX BFO: <http://purl.obolibrary.org/obo/BFO_>",
"PREFIX CHEBI: <http://purl.obolibrary.org/obo/CHEBI_>",
"PREFIX OBI: <http://purl.obolibrary.org/obo/OBI_>",
"",
"SELECT DISTINCT ?curie ?label ?comment ?replacement ?replacement_label ?suggestion ?suggestion_label",
"WHERE {",
" VALUES (?iri ?curie ?label ?comment ?replacement ?replacement_label ?suggestion ?suggestion_label) {",
]
for row in rows:
lines.append(f""" ({row['ID']} "{row['ID']}" "{row['Label']}" "{row['Comment']}" "{row['Replacement']}" "{row['Replacement Label']}" "{row['Suggestion']}" "{row['Suggestion Label']}")""")
lines += [
" }",
" ?iri ?p ?o",
" FILTER NOT EXISTS {",
" ?iri owl:deprecated ?deprecated",
" }",
"}",
]
f = open(path, "w")
for line in lines:
f.write(line + '\n')


def generate_remove_annotations(path, rows):
lines = []
for row in rows:
lines.append(f"{row['ID']} # {row['Label']}")
f = open(path, "w")
for line in lines:
f.write(line + '\n')


def generate_remove_axioms(path, rows):
lines = []
signature = set()
for row in rows:
lines.append(f"{row['ID']} # {row['Label']}")
if row['Signature']:
for term in row['Signature'].split():
signature.add(term)
for term in signature:
lines.append(f"{term}")
f = open(path, "w")
for line in lines:
f.write(line + '\n')


def generate_strict_replacement(path, rows):
lines = ["Old IRI\tNew IRI"]
for row in rows:
lines.append(f"{row['ID']}\t{row['Replacement']}")
f = open(path, "w")
for line in lines:
f.write(line + '\n')


def generate_suggested_replacement(path, rows):
lines = ["Old IRI\tNew IRI"]
for row in rows:
if row['Suggestion']:
lines.append(f"{row['ID']}\t{row['Suggestion']}")
else:
lines.append(f"{row['ID']}\t{row['Replacement']}")
f = open(path, "w")
for line in lines:
f.write(line + '\n')


if __name__ == "__main__":
path = "migrate.tsv"
with open(path) as f:
rows = list(csv.DictReader(f, delimiter="\t"))
generate_report("report.rq", rows)
generate_remove_annotations("remove-annotations.txt", rows)
generate_remove_axioms("remove-axioms.txt", rows)
generate_strict_replacement("strict-replacement.tsv", rows)
generate_suggested_replacement("suggested-replacement.tsv", rows)

35 changes: 35 additions & 0 deletions src/scripts/migrate/migrate.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
ID Label Comment Replacement Replacement Label Suggestion Suggestion Label Signature
BFO:0000001 entity does no work: remove or use owl:Thing owl:Thing
BFO:0000002 continuant use something more specific, or a disjunction (X or Y) owl:Thing owl:Thing
BFO:0000004 independent continuant generalized to owl:Thing, or constrain to 'material entity' owl:Thing BFO:0000040 material entity
BFO:0000030 object generalize to 'material entity' BFO:0000040 material entity
BFO:0000027 object aggregate generalize to 'material entity' BFO:0000040 material entity
BFO:0000024 fiat object part generalize to 'material entity' BFO:0000040 material entity
BFO:0000140 continuant fiat boundary generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000147 zero-dimensional continuant fiat boundary generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000142 one-dimensional continuant fiat boundary generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000146 two-dimensional continuant fiat boundary generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000006 spatial region generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000018 zero-dimensional spatial region generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000026 one-dimensional spatial region generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000009 two-dimensional spatial region generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000028 three-dimensional spatial region generalize to 'immaterial entity' BFO:0000141 immaterial entity
BFO:0000020 specifically dependent continuant use 'characteristic' owl:Thing COB:0000502 characteristic
BFO:0000019 quality use 'characteristic' COB:0000502 characteristic
BFO:0000145 relational quality use 'characteristic' COB:0000502 characteristic
BFO:0000031 generically dependent continant constrain to 'information entity' owl:Thing IAO:0000030 information content entity
BFO:0000003 occurrent constrain to 'process' owl:Thing BFO:0000015 process
BFO:0000182 history generalize to 'process' BFO:0000015 process
BFO:0000035 process boundary better to just use the start and end of a process owl:Thing
BFO:0000008 temporal region you probably want a process instead owl:Thing
BFO:0000148 zero-dimensional temporal region you probably want a process instead owl:Thing
BFO:0000038 one-dimensional temporal region you probably want a process instead owl:Thing
BFO:0000144 process profile use 'characteristic' COB:0000502 characteristic
CHEBI:23367 molecular entity replace with COB term COB:0000013 molecule
CHEBI:33250 atom replace with COB term COB:0000011 atom
CHEBI:33696 nucleic acid replace with COB term COB:0000049 nucleic acid chain
OBI:0000011 planned process replace with COB term COB:0000035 completely executed planned process BFO:0000015 BFO:0000055 RO:0000059 IAO_000104
OBI:0000047 processed material replace with COB term COB:0000026 processed material entity
OBI:0000094 material processing replace with COB term COB:0000110 material processing BFO:0000040 OBI:0000047 OBI:0000293 OBI:0000299 OBI:0000417 OBI:0000456 OBI:0600010
OBI:0000968 device replace with COB term COB:0001300 device BFO:0000034 BFO:0000040 OBI:0000086 OBI:0000094 OBI:0000312 RO:0000085 RO:0000087
OBI:0100026 organism replace with COB term COB:0000022 organism BFO:0000040 NCBITaxon:10239 NCBITaxon:2 NCBITaxon:2157 NCBITaxon:2759
34 changes: 34 additions & 0 deletions src/scripts/migrate/remove-annotations.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
BFO:0000001 # entity
BFO:0000002 # continuant
BFO:0000004 # independent continuant
BFO:0000030 # object
BFO:0000027 # object aggregate
BFO:0000024 # fiat object part
BFO:0000140 # continuant fiat boundary
BFO:0000147 # zero-dimensional continuant fiat boundary
BFO:0000142 # one-dimensional continuant fiat boundary
BFO:0000146 # two-dimensional continuant fiat boundary
BFO:0000006 # spatial region
BFO:0000018 # zero-dimensional spatial region
BFO:0000026 # one-dimensional spatial region
BFO:0000009 # two-dimensional spatial region
BFO:0000028 # three-dimensional spatial region
BFO:0000020 # specifically dependent continuant
BFO:0000019 # quality
BFO:0000145 # relational quality
BFO:0000031 # generically dependent continant
BFO:0000003 # occurrent
BFO:0000182 # history
BFO:0000035 # process boundary
BFO:0000008 # temporal region
BFO:0000148 # zero-dimensional temporal region
BFO:0000038 # one-dimensional temporal region
BFO:0000144 # process profile
CHEBI:23367 # molecular entity
CHEBI:33250 # atom
CHEBI:33696 # nucleic acid
OBI:0000011 # planned process
OBI:0000047 # processed material
OBI:0000094 # material processing
OBI:0000968 # device
OBI:0100026 # organism
55 changes: 55 additions & 0 deletions src/scripts/migrate/remove-axioms.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
BFO:0000001 # entity
BFO:0000002 # continuant
BFO:0000004 # independent continuant
BFO:0000030 # object
BFO:0000027 # object aggregate
BFO:0000024 # fiat object part
BFO:0000140 # continuant fiat boundary
BFO:0000147 # zero-dimensional continuant fiat boundary
BFO:0000142 # one-dimensional continuant fiat boundary
BFO:0000146 # two-dimensional continuant fiat boundary
BFO:0000006 # spatial region
BFO:0000018 # zero-dimensional spatial region
BFO:0000026 # one-dimensional spatial region
BFO:0000009 # two-dimensional spatial region
BFO:0000028 # three-dimensional spatial region
BFO:0000020 # specifically dependent continuant
BFO:0000019 # quality
BFO:0000145 # relational quality
BFO:0000031 # generically dependent continant
BFO:0000003 # occurrent
BFO:0000182 # history
BFO:0000035 # process boundary
BFO:0000008 # temporal region
BFO:0000148 # zero-dimensional temporal region
BFO:0000038 # one-dimensional temporal region
BFO:0000144 # process profile
CHEBI:23367 # molecular entity
CHEBI:33250 # atom
CHEBI:33696 # nucleic acid
OBI:0000011 # planned process
OBI:0000047 # processed material
OBI:0000094 # material processing
OBI:0000968 # device
OBI:0100026 # organism
OBI:0000047
NCBITaxon:2157
BFO:0000055
OBI:0000094
NCBITaxon:2
BFO:0000040
NCBITaxon:10239
NCBITaxon:2759
OBI:0000299
BFO:0000015
OBI:0600010
IAO_000104
OBI:0000086
RO:0000085
OBI:0000417
OBI:0000312
OBI:0000293
RO:0000087
OBI:0000456
RO:0000059
BFO:0000034
48 changes: 48 additions & 0 deletions src/scripts/migrate/report.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX BFO: <http://purl.obolibrary.org/obo/BFO_>
PREFIX CHEBI: <http://purl.obolibrary.org/obo/CHEBI_>
PREFIX OBI: <http://purl.obolibrary.org/obo/OBI_>

SELECT DISTINCT ?curie ?label ?comment ?replacement ?replacement_label ?suggestion ?suggestion_label
WHERE {
VALUES (?iri ?curie ?label ?comment ?replacement ?replacement_label ?suggestion ?suggestion_label) {
(BFO:0000001 "BFO:0000001" "entity" "does no work: remove or use owl:Thing" "owl:Thing" "" "" "")
(BFO:0000002 "BFO:0000002" "continuant" "use something more specific, or a disjunction (X or Y)" "owl:Thing" "" "owl:Thing" "")
(BFO:0000004 "BFO:0000004" "independent continuant" "generalized to owl:Thing, or constrain to 'material entity'" "owl:Thing" "" "BFO:0000040" "material entity")
(BFO:0000030 "BFO:0000030" "object" "generalize to 'material entity'" "BFO:0000040" "material entity" "" "")
(BFO:0000027 "BFO:0000027" "object aggregate" "generalize to 'material entity'" "BFO:0000040" "material entity" "" "")
(BFO:0000024 "BFO:0000024" "fiat object part" "generalize to 'material entity'" "BFO:0000040" "material entity" "" "")
(BFO:0000140 "BFO:0000140" "continuant fiat boundary" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000147 "BFO:0000147" "zero-dimensional continuant fiat boundary" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000142 "BFO:0000142" "one-dimensional continuant fiat boundary" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000146 "BFO:0000146" "two-dimensional continuant fiat boundary" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000006 "BFO:0000006" "spatial region" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000018 "BFO:0000018" "zero-dimensional spatial region" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000026 "BFO:0000026" "one-dimensional spatial region" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000009 "BFO:0000009" "two-dimensional spatial region" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000028 "BFO:0000028" "three-dimensional spatial region" "generalize to 'immaterial entity'" "BFO:0000141" "immaterial entity" "" "")
(BFO:0000020 "BFO:0000020" "specifically dependent continuant" "use 'characteristic'" "owl:Thing" "" "COB:0000502" "characteristic")
(BFO:0000019 "BFO:0000019" "quality" "use 'characteristic'" "COB:0000502" "characteristic" "" "")
(BFO:0000145 "BFO:0000145" "relational quality" "use 'characteristic'" "COB:0000502" "characteristic" "" "")
(BFO:0000031 "BFO:0000031" "generically dependent continant" "constrain to 'information entity'" "owl:Thing" "" "IAO:0000030" "information content entity")
(BFO:0000003 "BFO:0000003" "occurrent" "constrain to 'process'" "owl:Thing" "" "BFO:0000015" "process")
(BFO:0000182 "BFO:0000182" "history" "generalize to 'process'" "BFO:0000015" "process" "" "")
(BFO:0000035 "BFO:0000035" "process boundary" "better to just use the start and end of a process" "owl:Thing" "" "" "")
(BFO:0000008 "BFO:0000008" "temporal region" "you probably want a process instead" "owl:Thing" "" "" "")
(BFO:0000148 "BFO:0000148" "zero-dimensional temporal region" "you probably want a process instead" "owl:Thing" "" "" "")
(BFO:0000038 "BFO:0000038" "one-dimensional temporal region" "you probably want a process instead" "owl:Thing" "" "" "")
(BFO:0000144 "BFO:0000144" "process profile" "use 'characteristic'" "COB:0000502" "characteristic" "" "")
(CHEBI:23367 "CHEBI:23367" "molecular entity" "replace with COB term" "COB:0000013" "molecule" "" "")
(CHEBI:33250 "CHEBI:33250" "atom" "replace with COB term" "COB:0000011" "atom" "" "")
(CHEBI:33696 "CHEBI:33696" "nucleic acid" "replace with COB term" "COB:0000049" "nucleic acid chain" "" "")
(OBI:0000011 "OBI:0000011" "planned process" "replace with COB term" "COB:0000035" "completely executed planned process" "" "")
(OBI:0000047 "OBI:0000047" "processed material" "replace with COB term" "COB:0000026" "processed material entity" "" "")
(OBI:0000094 "OBI:0000094" "material processing" "replace with COB term" "COB:0000110" "material processing" "" "")
(OBI:0000968 "OBI:0000968" "device" "replace with COB term" "COB:0001300" "device" "" "")
(OBI:0100026 "OBI:0100026" "organism" "replace with COB term" "COB:0000022" "organism" "" "")
}
?iri ?p ?o
FILTER NOT EXISTS {
?iri owl:deprecated ?deprecated
}
}
Loading