-
Notifications
You must be signed in to change notification settings - Fork 35
OWL NETS 2.0
Original Repository: OWL-NETS
Example Application: OWLNETS_Example_Application.ipynb
Purpose: OWL-NETS (NEtwork Transformation for Statistical learning) is a computational method that reversibly abstracts Web Ontology Language (OWL)-encoded biomedical knowledge into a more biologically meaningful network representation. OWL-NETS generates semantically rich knowledge graphs that contain heterogeneous nodes and edges and can be used for tasks that do not require OWL semantics.
Publication for V1.0:
Callahan TJ, Baumgartner WA, Bada M, Stefanski AL, Tripodi I, White EK, Hunter LE. OWL-NETS: Transforming OWL Representations for Improved Network Inference. Pac Symp Biocomput. 2018;23:133-144. PMID:29218876; PMCID:PMC5737627
OWL-NETS 2.0: This wiki discusses an alternative and arguably more generalizable adaptation of the original project. This new version was developed as a fundamental component of the PheKnowLator project to decode OWL-encoded classes.
An ontology or knowledge graph built using OWL using contains two types of entities that we'd like to decode when transforming into an OWL-NETS representation: (1) owl:Class and (2) owl:Axiom. While each of the components shown below is needed to build a semantically rich knowledge graph, the majority of the information used to construct each object is not biologically or clinically meaningful. Thus, the goal of the current algorithm is to decode all OWL-encoded classes and axioms (like those shown below) into something more clinically or biologically meaningful.
<!-- http://purl.obolibrary.org/obo/CL_0000995 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/CL_0000995">
<owl:equivalentClass>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/CL_0001021"/>
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/CL_0001026"/>
</owl:unionOf>
</owl:Class>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CL_0001060"/>
</owl:Class>http://purl.obolibrary.org/obo/CL_0000995
The OWL class CL_0000995 (i.e. CD34-positive, CD38-positive common myeloid progenitor OR CD34-positive, CD38-positive common lymphoid progenitor) was built by taking the union:
-
CL_0001021(i.e. CD34-positive, CD38-positive common lymphoid progenitor) -
CL_0001026(i.e. CD34-positive, CD38-positive common myeloid progenitor)
OWL-NETS would decode this class into:
CL_0001021, rdfs:subClassOf, CL_0000995
CL_0001026, rdfs:subClassOf, CL_0000995
CL_0000995, rdfs:subClassOf, CL_0001060<!-- http://purl.obolibrary.org/obo/HP_0000340 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/HP_0000340">
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/PATO_0001481"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0000052"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0008200"/>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002573"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/PATO_0000460"/>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</owl:someValuesFrom>
</owl:Restriction>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/HP_0000290"/>
</owl:Class>http://purl.obolibrary.org/obo/HP_0000340
The owl class HP_0000340 (i.e. sloping forehead) was built by taking the intersection of:
-
PATO_0001481, RO_0000052, UBERON_0008200(i.e. sloped, inheres in, forehead) -
PATO_0001481, RO_0002573, PATO_0000460(i.e. sloped, has modifier, abnormal)
OWL-NETS would decode this class into:
HP_0000340, RO_0000086, PATO_0001481
HP_0000340, RO_0000052, UBERON_0008200
HP_0000340, RO_0002573, PATO_0000460
HP_0000340, rdfs:subClassOf, HP_0000290<!-- http://purl.obolibrary.org/obo/GO_0000785 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0000785">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0110165"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0005694"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>http://purl.obolibrary.org/obo/GO_0000785
The OWL class GO_0000785 (i.e. chromatin) is a restricted to BFO_0000050 (i.e. part of) GO_0005694 (i.e. chromosome)
OWL-NETS would decode this class into:
GO_0000785, BFO_0000050, GO_0005694
GO_0000785, rdfs:subClassOf, GO_0110165<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/CL_0002004"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget rdf:resource="http://purl.obolibrary.org/obo/CL_0000547"/>
<oboInOwl:is_inferred rdf:datatype="http://www.w3.org/2001/XMLSchema#string">true</oboInOwl:is_inferred>
</owl:Axiom>http://purl.obolibrary.org/obo/CL_0002004
The OWL class CL_0002004 (i.e. CD34-negative, GlyA-negative proerythroblast) has the following logical statements:
-
CL_0002004SubClassOfUBERON_0002238(CD34-negative, GlyA-negative proerythroblast subClassOf CD34-negative, GlyA-negative proerythroblast)
OWL-NETS would decode this axiom into:
CL_0002004, rdfs:subClassOf, CL_0000547<owl:Axiom>
<owl:annotatedSource>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/UBERON_0010757"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_9606"/>
</owl:Restriction>
</owl:intersectionOf>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002238"/>
</owl:Class>
</owl:annotatedSource>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002238"/>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FMA</oboInOwl:source>
</owl:Axiom>http://purl.obolibrary.org/obo/UBERON_0010757
The OWL class UBERON_0010757 (i.e. rib 8) has the following logical statements:
-
UBERON_0010757andBFO_0000050someNCBITaxon_9606(rib 8 part of Homo sapiens) -
UBERON_0010757SubClassOfUBERON_0002238(rib 8 subClassOf false rib)
OWL-NETS would decode this axiom into:
UBERON_0010757, BFO_0000050, NCBITaxon_9606<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002373"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002202"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010023"/>
</owl:Restriction>
</owl:annotatedTarget>
</owl:Axiom>http://purl.obolibrary.org/obo/UBERON_0002373
The OWL class UBERON_0002373 (i.e. Palantine tonsil) has the following logical statement:
-
UBERON_000556RO_0002202someUBERON_0010023(palantine tonsil develops from dorsal paryngeal pouch 2)
OWL-NETS would decode this axiom into:
UBERON_0002373, RO_0002202, UBERON_0010023<owl:Axiom>
<owl:annotatedSource>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/UBERON_0005562"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_40674"/>
</owl:Restriction>
</owl:intersectionOf>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002254"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010028"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
</owl:annotatedSource>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002254"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010028"/>
</owl:Restriction>
</owl:annotatedTarget>
<oboInOwl:notes rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Mammals</oboInOwl:notes>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ISBN:0073040584-table13.1</oboInOwl:source>
</owl:Axiom>http://purl.obolibrary.org/obo/UBERON_0005562
The OWL class UBERON_0005562 (i.e. Thymus primordium) has the following logical statements:
-
UBERON_0005562andBFO_0000050someNCBITaxon_40674(thymus primordium part of mammalia) -
UBERON_0005562SubClassOfRO_0002254someUBERON_0010028(thymus primordium has developmental contribution from ventral part of pharyngeal pouch 4)
OWL-NETS would decode this axiom into:
UBERON_0005562, RO_0002254, UBERON_0010028
UBERON_0005562, BFO_0000050, NCBITaxon_40674 The algorithm has three goals, each of which is further explained below:
- Decode all OWL-encoded classes
- Remove all triples that contain
subjects,predicates, and/orobjectsthat are needed to ensure OWL semantics, but are not biologically meaningful - Ensure decoded knowledge graph contains a single connected component
- Purify the decoded knowledge graph to match an input knowledge graph construction approach (i.e.
subclassorinstance)
A high-level overview of the algorithm is provided in the snippet of the pseudocode below.
- Map
owl:Classinstances back to the originalowl:Class - Remove all triples that do not contain a subject or object of type
BNodeorLiteral - Keep triples containing any
owl:ObjectPropertyoccurring with subject and objects that areowl:Classorowl:NamedIndividual
Depending on the source ontology that you apply OWL-NETS to, it's possible that the decoded knowledge graph may contain more than a single connected component. This step ensures that the decoded knowledge graph is connected.
- Derives a set of root nodes by searching for each node's highest ancestor concept (via
rdfs:subClassOf).- If the node has no ancestors, all of the node's immediate neighbors are searched and the most frequently visited, highest common ancestor among the neighbors is selected. If none of the neighborhood concepts have any ancestors in common, a random ancestor concept is selected
- If the node has more than 1 neighbor, the highest ancestor concept is selected
- Each root node is then added to the graph as
rdfs:subClassOfa user-provided URI. BFO_0000001 is the default choice
Currently, the program is configured to output the results from OWL-NETS in two ways: (1) run the program as-is or (2) run the program as-is with an additional step to "purify" the output by ensuring that the resulting OWL-NETS graph is completely consistent with the specified knowledge graph construction approach (i.e. subclass or instance-based). The "purified" output will include _SUBCLASS_purified_ or _INSTANCE_purified_ in the file names.
The procedure utilized to "purify" the graph is as follows:
- Subclass Construction Approach:
- Find all triples containing
rdf:type(subjrdf:typeobj)- Replace
rdf:typewithrdfs:subClassOf - Make
subjrdfs:subClassOfall ancestors ofobj
- Replace
- Find all triples containing
- Instance Construction Approach:
- Find all triples containing
rdfs:subClassOf(subjrdfs:subClassOfobj)- Replace
rdfs:subClassOfwithrdf:type - Make
subjrdf:typeall ancestors ofobj
- Replace
- Find all triples containing
ASSUMPTIONS:
Don't Decode
- Classes built using the
owl:complementOfconstructors - Triples containing annotations
- Triples that contain
oneOf(e.g.IAO_0000225) - Triples containing properties signifying negation
ObjectPropertyorowl:Class(e.g.lacks_part,disjointWith)
Decode
- The following property types:
someValuesFrom,onClass,hasSelf,hasValue,allValuesFrom - Triples containing cardinality constraints, but ignore cardinality
To determine owl:ObjectProperties in decoded owl:intersectionOf or owl:unionOf constructors:
-
RO_0000086(has quality): If subject is NOT a PATO term and object IS a PATO term - Provided
onProperty: If both subject and object ARE PATO terms AND there is anonPropertyprovided -
rdfs:subClassOf(subclassbuild) /rdf:type(instancebuild):- If both subject and object ARE PATO terms AND there is not an
onProperty - If both subject and object ARE NOT PATO terms AND there is not an
onProperty
- If both subject and object ARE PATO terms AND there is not an
Inputs and Outputs:
-
Input Data:
- A Networkx MultiDigraph
- An RDFLib Graph
- A
filepathandfilenameto write output to
-
Output Data:
- A Networkx MultiDigraph
- An RDF graph containing all of the owl-encoded (Step 1) and triples containing OWL semantics (Step 2) serialized in
ntformat - A Hash Map Storing Transformation information:
{'owl_nets': { 'decoded_classes': {}, 'complementOf': {}, 'cardinality': {}, 'negation': {}, 'misc': {}}, 'disjointWith': {}, 'filtered_triples': set(), '<<knowledge construction approach>>_approach_purified': set()}
Jupyter Notebook: OWLNETS_Example_Application.ipynb
To run OWL-NETS on a graph or ontology without running pkt_kg you need to provide: (1) fork or clone the PheKnowLator GitHub repository; (2) provide an RDFLib Graph() object or file path to the object you want to transform; (3) provide a path to where the output should be written; and (4) provide a filename (i.e. owl_nets_output). From the PheKnowLator directory run the following code:
from rdflib import Graph
from pkt_kg.owlnets import OwlNets
# load ontology
hp_graph = Graph().parse('path/to/file/hp.owl')
# instantiate class
owl_nets = OwlNets(graph=hp_graph, write_location='resources/', filename='/hpo_test')
# run the method
owl_nets.run_owl_nets()