-
Notifications
You must be signed in to change notification settings - Fork 31
MachineLearning
Investigating Multi-Layer Perceptrons, Convolutional Neural Networks, Regression Models, and Ensembl methods for prediction of disease progression, impact of geographical distribution, etc.
(Side Note: There seems to be some overlap between this Task and the BioStatistics Task. It may be worth considering merging these two.)
For the time being, there is a #machinelearning channel on the Slack group (check out the virtual-biohackathon@googlegroups.com group for the invitation link). During the BioHackathon, we'll update this section.
Data
- Johns Hopkins repo
- European Centre for Disease Prevention and Control
- Automated Data Collection: COVID-19/SARS-COV-2 Cases in EU by Country, State/Province/Local Authorities, and Date
- SARS-CoV-2 sequences GenBank
Tools:
- Coronavirus Tracker API
- R package for the data colated by Johns Hopkins
- Penn Medicine - COVID19 Hospital Impact Model for Epidemics
- Epidemic Calculator by Gabriel Goh
- Pandemic Preparedness Planning for COVID-19, by Markus Schwehm and Martin Eichner together with the Landesgesundheitsamt Baden-Württemberg/Germany
- Models repo by Pedro Mendes of University of Connecticut
Machine learing requires much computing resources, in many cases GPUs. Kubeflow, as a highly portable and cloud native platform for workflows, is highly optimised for machine learning. Containerised workloads can easily be ported onto it.
- Apply Markovian Clustering (MCL) on the currently available SARS-CoV-2 sequences GenBank sequences in order to identify potential groupings beyond the traditional phylogenetic ones. Apply both at the NT and the AA level, based on a number of distance metrics (aka e-value, string distance, etc).