Skip to content
Aneesh Panoli edited this page Mar 27, 2020 · 51 revisions

Investigating Multi-Layer Perceptrons, Convolutional Neural Networks, Regression Models, and Ensembl methods for prediction of disease progression, impact of geographical distribution, etc.

(Side Note: There seems to be some overlap between this Task and the BioStatistics Task. It may be worth considering merging these two.)

Communication

For the time being, there is a #machinelearning channel on the Slack group (check out the virtual-biohackathon@googlegroups.com group for the invitation link). During the BioHackathon, we'll update this section.

Resources

Data

Tools:

Ideas

Machine learing requires much computing resources, in many cases GPUs. Kubeflow, as a highly portable and cloud native platform for workflows, is highly optimised for machine learning. Containerised workloads can easily be ported onto it.

  • Apply Markovian Clustering (MCL) on the currently available SARS-CoV-2 sequences GenBank sequences in order to identify potential groupings beyond the traditional phylogenetic ones. Apply both at the NT and the AA level, based on a number of distance metrics (aka e-value, string distance, etc).

Participants

Clone this wiki locally