Skip to content
Ali Haider Bangash edited this page Apr 8, 2020 · 7 revisions

Task 1: Results

Details

Results shall be collected in Task 1: The Results Results could be analysis that represents new knowledge about SARS-COV-2 genomic features. Also outputs that plug into other topics in the biohackathon would be helpful e.g. genomic features that are demonstrated to contain information about a modeled response variable that could be added to annotations in public_sequence_resource.

Task 2: RNA Secondary Structure Analysis

A feature extraction pipeline that uses RNA secondary structure features generated by ViennaRNA folding utilities.

Task 3: MHC binding predictions

Only specific parts of viruses or other intruders are presented on the cells' surfaces together with MHC molecules for the immune system to pick up. Hence, machine learning based models predict which parts, the so called epitopes, have a high binding affinity to MHC class I or II and which ones do not. We could integrate the binding affinity for MHC class I and II as features.

Results shall be collected in Task 3: The Results

Web tools that can be used as part of feature extraction: T cell Epitope Prediction Tools B cell Epitope Prediction Tools

We need to aggregate data for coronaviridae antigenic protein sequences i.e. spike, nucleocapsid etc.

Task 4: Nucleic acid feature analysis

Clustering and Supervised analysis using K-mer based feature extraction methods. Outputs could be insights into relationships between different coronavirus species or comparisons between SARS-COV-2 genomes, as well as feature selection methods that identify important features as they relate to an output variable.

Details

Task 5: Amino acid feature analysis

Similar to the Nucleic acid analysis but for protein sequences.

Details