-
Notifications
You must be signed in to change notification settings - Fork 1
CORD 19 Semantic Annotation Projects
This page lists projects that are doing semantic annotation of the CORD-19 dataset. If you know of a project that is not listed here, please add it AND please contact David Booth, who chairs a semi-weekly teleconference (11am Boston time) to coordinate and learn about each other's efforts.
Teleconferences are announced on the public W3C Healthcare and Life Sciences mailing list.
- 2020-09-29 Houcemeddine Turki, University of Sfax, Tunisia: Wikidata and COVID-19, Creating a collaborative knowledge graph from CORD-19 scholarly publications
- 2020-07-28 Marcin Joachimiak, Lawrence Berkeley Natl Lab: KG-COVID-19, A knowledge graph for COVID-19 response
- 2020-07-21 Michael Liebman, IPQ Analytics: Modeling COVID-19, from the clinic back
- 2020-06-23 Jin-Dong Kim Covid19-PubAnnotation
- 2020-06-16 Victor Mireles, Semantic Web Company: COVID-19 Knowledge Graph.
- 2020-06-16 Feichen Shen and David Oniani, Mayo Clinic: Constructing Co-occurrence Network Embeddings
- 2020-06-02 Scott Malec, University of Pittsburgh: CORD-SEMANTICTRIPLES / Machine Reading for COVID-19 and Alzheimer's
- 2020-06-02 Pedro Szekely, USC Information Sciences Institute: A Knowledge Graph Integrating Annotations On 20,000 COVID-19 Scientific Articles
- 2020-05-26 Oliver Giles, SciBite: TERMite CORD19
- 2020-05-19 Gaurav Vaidya: OmniCORD
- 2020-05-19 Gollam Rabby, VSE University, Prague: Entity-Based-Document-Classification-on-the-CORD---19-Corpus
- 2020-05-19 Marcin Joachimiak, Lawrence Berkeley National
- 2020-05-19 Michael Liebman, IPQ Analytics LLC: Modeling COVID-19 From the Clinic Back
- 2020-05-19 David Booth, Mayo Clinic (consultant): CORD-19-on-FHIR
- 2020-05-12 Franck Michel, Université Côte d’Azur, CNRS, Inria: CORD-19 Named Entities Knowledge Graph (CORD19-NEKG)
- Project name: COVID-KG
- Project name: CORD-ReDrugS
- @@@@
2020-09-29 Houcemeddine Turki, University of Sfax, Tunisia: Wikidata and COVID-19, Creating a collaborative knowledge graph from CORD-19 scholarly publications
Contact name and email: Houcemeddine A. Turki turkiabdelwaheb@hotmail.fr
Description: Knowledge graphs are an essential ingredient for information systems to handle the ever growing COVID-19 data on a daily basis. This presentation explains how open and collaborative FAIR knowledge bases like Wikidata can be useful to create a large-scale semantic representation of COVID-19 information from CORD-19 scholarly publications. I give an overview of how a data model has been collaboratively developed and maintained for COVID-19 knowledge, and I provide a detailed snapshot about the various methods used to extract items and statements from CORD-19 research papers. Then, I outline the tools for the enrichment of COVID-19 information on Wikidata as well as the knowledge graph validation methods applicable to COVID-19 knowledge. Finally, I describe the COVID-19 information in Wikidata and discuss its usefulness in supporting human decisions and social recommendations about the infectious disease.
Data format(s): RDF
Data license: CC0
Website or github URL: https://www.wikidata.org/wiki/Wikidata:WikiProject_COVID-19
Draft paper: https://zenodo.org/record/4033382 and https://zenodo.org/record/4008359
Slides: https://commons.wikimedia.org/wiki/File:W3_CORD-19_-_Wikidata_and_COVID-19.pdf
Video presentation (recorded 2020-09-29): https://youtu.be/TwudGFtT4A4
Chat comments made during presentation: It is possible to link to the individual phrase from which a Wikidata statement originated. Demo at the reference for the “schizophrenia” claim in https://www.wikidata.org/w/index.php?title=Q13561329&oldid=1278213857#P1910 . It has a “reference URL” that points to https://via.hypothes.is/https://pubmed.ncbi.nlm.nih.gov/11126396/#annotations:HVXGMnfUEeetV2sj-_VpSQ . We are using existing vocabularies to the extent possible. SNOMED-CT and many others, however, are not openly licensed, so we cannot incorporate them wholesale. What we can do, though, is mapping. Th ORKG structured annotations are a demo for COVID, they do not yet work at scale. Here is an example of such an argumentation-focused knowledge graph: https://hi-knowledge.org/ It focuses on invasion biology for now.
2020-07-28 Marcin Joachimiak, Lawrence Berkeley Natl Lab: KG-COVID-19, A knowledge graph for COVID-19 response
Contact name and email: Marcin Joachimiak marcinjoachimiak@gmail.com
Description:
Data format(s):
Data license:
Website or github URL:
Paper: https://www.biorxiv.org/content/10.1101/2020.08.17.254839v1
Video presentation (recorded 2020-07-28): https://youtu.be/iGqRvKhuuSs
Contact name and email: Michael Liebman michael.liebman@ipqanalytics.com
Description:
Data format(s):
Data license:
Website or github URL:
Draft paper:
Slides:
Video presentation (recorded 2020-07-21): https://youtu.be/ueJunueo3hg
Contact name and email: Jin-Dong Kim jindong.kim@gmail.com
Description: PubAnnotation is a repository of text annotations, especially those made to literature of life sciences, e.g., PubMed or PMC articles. If one has such annotations, they can be registered in PubAnnotation. When annotations are registered, PubAnnotation aligns them to the canonical text that is taken from PubMed and PMC, which means all the annotations in PubAnnotation are linked to each other through canonical texts. It is a new way of publishing or sharing text annotations using recent web technology: annotations will become accessible and searchable through standard web protocol, e.g., REST API.
Data format(s):
Data license:
Website or github URL: https://pubannotation.org/
Video presentation (recorded 2020-06-23): https://youtu.be/LKz9kRtLi9I
Contact name and email: Victor Mireles-Chavez victor.mireles-chavez@semantic-web.com
Description:
Data format(s):
Data license:
Website or github URL:
Draft paper:
Slides: https://docs.google.com/presentation/d/1xaS_88sJ47iSrvv0ezOfjscIvG2VINUe7vqrUEMiaCA/edit?usp=sharing
Video presentation (Recorded 2020-06-16): https://youtu.be/HnoTDndSK_A
2020-06-16 Feichen Shen and David Oniani, Mayo Clinic: Constructing Co-occurrence Network Embeddings
Contact name and email: Feichen Shen, Ph.D. Shen.Feichen@mayo.edu and David Oniani Oniani.David@mayo.edu
Description: Constructing Co-occurrence Network Embeddings to Assist Associate Extraction for COVID-19 and Other Coronavirus Infectious Diseases
Data format(s):
Data license:
Website or github URL:
Slides: https://www.davidoniani.com/research/co-occurence-network-embeddings-presentation.pdf
Video presentation (recorded 16-Jun-2020): https://youtu.be/RxEsBP40OxE
2020-06-02 Scott Malec, University of Pittsburgh: CORD-SEMANTICTRIPLES / Machine Reading for COVID-19 and Alzheimer's
Contact name and email: Scott Malec (scott.malec@gmail.com | sam413@pitt.edu)
Description: Computable knowledge extracted from the literature using machine reading can help researchers best understand and leverage the unprecedented volume of information gathered about the novel coronavirus. We hypothesize that machine interpretation techniques can be used to build graphical models of related concepts, with highly-connected nodes suggesting potentially plausible biological actors. We introduce a new resource, derived from the Semantic MEDLINE database (SemMedDB), reflecting documents also in the COVID-19 corpus. SemMedDB contains concept-relation-concept semantic triples, or predications. After extracting ~106K semantic predications, we imported these into a network and applied network centrality metrics (degree, closeness, betweenness) to identify and substantiate association factors related to COVID-19 for biological plausibility. Filtering the nodes by semantic type to search for drugs, drug targets, biomarkers, or comorbidities associated with complications, we were able to recapitulate agents already in randomized controlled trials for preventing or treating COVID-19 infections, comorbidities associated with lethal complications, many of which made sense upon further inspection. This guilt-by-association analysis demonstrates the value of the information revealed as computable knowledge by machine reading software.
Data format(s): RDF/XML, SQL, including Cytoscape-compatible formats (*.tsv, *.SIF)
Data license: TBD
Website or github URL: https://github.com/kingfish777/COVID19 still a mess. See the .cys file for a cytoscape-friendly version and the .xls spreadsheet for preliminary results. I will be uploading other formats, including the processing pipeline, and pointing to a more ambitious follow-up project applying computable knowledge derived using various machine reading frameworks of the COVID-19 corpus to support several practical use cases.
Draft paper: https://docs.google.com/document/d/1qQkLlvwOWOy1Rt7eUTKTCUfh-WodyXd8uv8UTnpCGoA/edit#
Slides: https://docs.google.com/presentation/d/13upacoOuKXhguToT-z2MNPE_iDJWY0vvbFz8NDAWpaQ/edit?usp=sharing
Video presentation (recorded June 2, 2020): https://www.youtube.com/watch?v=ydnx_Rg1PYs
2020-06-02 Pedro Szekely, USC Information Sciences Institute: A Knowledge Graph Integrating Annotations On 20,000 COVID-19 Scientific Articles
Contact name and email: Pedro Szekely szekely@usc.edu, USC Information Sciences Institute
Description:
Data format(s):
Data license:
Website or github URL: https://github.com/usc-isi-i2/kgtk
Paper: https://arxiv.org/abs/2006.00088
Jupyter Notebook with example on how to create the COVID-19 KG using KGTK: https://github.com/usc-isi-i2/CKG-COVID-19/blob/dev/build-covid-kg.ipynb
Slides: https://docs.google.com/presentation/d/1_uFKP6xmcV0rYjqVxorEI97weN4uauPxG11YHTo6tD8/edit?usp=sharing
Video presentation (recorded June 2, 2020): https://www.youtube.com/watch?v=ydnx_Rg1PYs&t=2346s
Contact name and email: Oliver Giles oliver.giles@scibite.com and James Malone james@scibite.com, SciBite
Description:
Data format(s):
Data license:
Website or github URL: https://www.scibite.com/
Slides: https://lists.w3.org/Archives/Public/www-archive/2020May/att-0003/covid19.pdf
Video presentation (recorded May 26, 2020): https://www.youtube.com/watch?v=3IdkRU9Durc
Contact name and email: Gaurav Vaidya, http://www.ggvaidya.com/
Description:
Data format(s):
Data license:
Website or github URL: https://github.com/NCATS-Gamma/omnicorp
Video presentation (recorded May 19, 2020): https://www.youtube.com/watch?v=YcoG9H6r7R0&t=9s
2020-05-19 Gollam Rabby, VSE University, Prague: Entity-Based-Document-Classification-on-the-CORD---19-Corpus
Contact name and email: Tomáš Kliegr tomas.kliegr@vse.cz, Gollam Rabby rabby2186@gmail.com
Description: Tools for extracting associations from knowledge graphs and transaction data Presentation: https://docs.google.com/presentation/d/1eX9eTb0C8roy7pYK8li3V5YcWhFByN6is6hz_b62AcA/edit#slide=id.p
Data format(s): RDF, SQL Dumps, CSV
Data license: NA
Website or github URL:
- JupyterLab notebook with existing code for entity extraction.
- Demo of our web-based self-service EasyMiner tool for learning rules from single CSVs.
- Demo of our web-based self-service RDFRules tool for learning rules from RDF KGs.
Video presentation (recorded May 19, 2020): https://www.youtube.com/watch?v=YcoG9H6r7R0&t=525s
Laboratory: KG-COVID-19, a knowledge graph for COVID-19 response
Contact name and email: Marcin Joachimiak marcinjoachimiak@gmail.com, Lawrence Berkeley National Laboratory, Monarch Initiative, and IDG
Description: Lightweight construction and maintenance of knowledge graphs for COVID-19 drug repurposing efforts.
Data format(s): RDF/TTL http://kg-hub.berkeleybop.io/kg-covid-19.nt.gz
Data license:
Website or github URL: https://covidscholar.org/
Slides: https://lists.w3.org/Archives/Public/www-archive/2020May/att-0002/01-part
Video presentation (recorded May 19, 2020): https://www.youtube.com/watch?v=YcoG9H6r7R0&t=996s
Contact name and email: Michael Liebman michael.liebman@ipqanalytics.com IPQ Analytics LLC
Description:
Data format(s):
Data license:
Website or github URL:
Slides: Not available
Video presentation (recorded May 19, 2020): https://www.youtube.com/watch?v=YcoG9H6r7R0&t=1503s
Contact name and email: David Booth david@dbooth.org, Jiang, Guoqian, M.D., Ph.D. Jiang.Guoqian@mayo.edu, Harold Solbrig solbrig@jhu.edu
Description: We are currently doing NLP to extract Conditions, Medications and Procedures from title and abstract. We plan to expand this to also look at the article full text where available. We are also using Pubtator to extract Species, Gene, Disease, Chemical, CellLine, Mutation and Strain. The result is represented in FHIR RDF.
Data format(s): FHIR RDF
Data license: Our annotations are CC0 licensed, though the CORD-19 dataset has its own licensing.
Website or github URL: https://github.com/fhircat/CORD-19-on-FHIR
Slides: https://tinyurl.com/cord-19-on-fhir
Video presentation (recorded May 19, 2020): https://www.youtube.com/watch?v=YcoG9H6r7R0&t=2218s
2020-05-12 Franck Michel, Université Côte d’Azur, CNRS, Inria: CORD-19 Named Entities Knowledge Graph (CORD19-NEKG)
Contact name and email: Fabien Gandon fabien.gandon@inria.fr, Franck Michel fmichel@i3s.unice.fr
Description: CORD-19 Named Entities Knowledge Graph (CORD19-NEKG) is an RDF dataset describing named entities identified in the scholarly articles of the COVID-19 Open Research Dataset (CORD-19). CORD19-NEKG is an initiative of the Wimmics team. RDF files are generated using Morph-xR2RML, an implementation of the xR2RML mapping language.
Data format(s): RDF Turtle
Data license: CORD19 license for the part of the dataset which is just an RDF version of CORD19 metadata, Open Data Commons Attribution License (https://opendatacommons.org/licenses/by/index.html) for the annotations that we produced.
Website or github URL: https://github.com/Wimmics/cord19-nekg Download: https://github.com/Wimmics/cord19-nekg/tree/master/dataset SPARQL endpoint: https://covid19.i3s.unice.fr/sparql
Paper: https://hal.archives-ouvertes.fr/hal-02939363/document
Video presentation (recorded May 12, 2020): https://www.youtube.com/watch?v=oUk9PXGM2fY
Contact name and email: Gilles Vandewiele gilles.vandewiele@ugent.be, Bram Steenwinckel bram.steenwickel@ugent.be
Description: Transform JSONs & CSV into RDF to create a Knowledge Graph that contains at least the same information as the original dataset, but with extra knowledge in addition in order to facilitate analysis of other researchers.
Data format(s): RDF
Data license: TBD, as open as possible.
Website or github URL: http://github.com/GillesVandewiele/COVID-KG/ & https://www.kaggle.com/group16/covid19-literature-knowledge-graph
Slides:
Video presentation (recorded @@@@):
Contact name and email: Jim McCusker mccusj2@rpi.edu
Description: Enhance ReDrugS [1] to use extracted entities and relations from [2] to repurpose potential therapies.
[1] McCusker JP, Dumontier M, Yan R, He S, Dordick JS, McGuinness DL. 2017. Finding melanoma drugs through a probabilistic knowledge graph. PeerJ Computer Science 3:e106 doi:10.7717/peerj-cs.106 [2] kaggle.com/yitongtseo/cord19-named-entities
Data format(s): RDF, SIO + PROV
Data license: TBD
Website or github URL:
Slides:
Video presentation (recorded ): None
Contact name and email:
Description:
Data format(s):
Data license:
Website or github URL:
Draft paper:
Slides:
Video presentation (recorded ):