Skip to content

Latest commit

 

History

History
105 lines (64 loc) · 5.62 KB

File metadata and controls

105 lines (64 loc) · 5.62 KB

hidgenclassifier: An R package implementing methodologies described in "Mining Mutation Contexts across the Genome to Map Tumor Site of Origin" by Chakraborty et al. (Nat Commun 12, 3051 (2021), Link)

Contents

Overview

hidgenclassifier is an R package implementing Bayesian hierarchical hidden genome classifier for cancer sites developed in "Mining Mutation Contexts across the Genome to Map Tumor Site of Origin" by Chakraborty, Martin, Guan, Begg and Shen (2021; Link). It provides various pre-processing, fitting, and post-processing functions that collectively simplify handling of genomic datasets for use in the classifier, facilitate training of the hidden genome model, compute predicted cancer type probabilities of new tumors based on trained models, and aid rigorous quantification of predictor effects (via odds ratios) in fitted models. The repository also includes a detailed vignette exemplifying the hidden genome classification methodology through the hidgenclassifier package (rendered here), and an interactive html version of one of the figures (namely, Figure 1) displayed in the main manuscript.

Repo Contents

  • R: R package code.
  • data: filtered subsets of TCGA whole-exome and MSK-IMPACT targeted cancer gene panel sequencing datasets used in the analysis presented in the manuscript.
  • man: package manual for help in R session.
  • src: C++ source codes implementing various computation-heavy back-end functions.
  • vignettes: R vignettes for R session html help pages.
  • figures: Interactive .html version of Figure 1 in the main manuscript.

System Requirements

Hardware Requirements

The package hidgenclassifier can be run on a standard computer with 2 GB of RAM. For optimal performance we recommend a computer with specs:

RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core

The installation-times noted in the following are from a computer with the recommended specs (16 GB RAM, 4 cores@3.3 GHz) and internet of speed 100 Mbps.

Software Requirements

OS Requirements

The GitHub development version of hidgenclassifier has been tested on Linux and Windows operating systems as follows: Linux: CentOS Linux release 7.8.2003 (Core) Windows: Windows 10

R version

The package hidgenclassifier depends on R v3.5.0 or newer. See the installation notes on the R project homepage for details on how to install the latest version of R.

R build tools

hidgenclassifier contains source C++ codes, and thus requires the necessary C++ compilers to be pre-installed. This, for example, can be ensured in Windows computers by installing Rtools. See the CRAN manual on installing R packages for more details on installing source R packages on various platforms.

Installation Guide

Installing Bioconductor dependencies

hidgenclassifier depends on a number of Bioconductor packages. To install these dependencies run the following commands in R:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(
  c("SomaticSignatures",
    "VariantAnnotation",
    "IRanges",
    "BSgenome.Hsapiens.UCSC.hg19")
)

Installing hidgenclassifier

Once the Bioconductor dependencies are all installed, the easiest way to install hidgenclassifier from GitHub is via R package devtools. Run the following commands in R to install devtools, if it is not already installed:

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

Then install hidgenclassifier as follows:

devtools::install_github("c7rishi/hidgenclassifier", build_vignettes = TRUE)

Typical install time

hidgenclassifier suggests a number of R packages (both on CRAN and on Bioconductor) for full functionality, and installing them all from scratch on a Windows/Mac computer using binary packages can take about 10 minutes. Install times on Linux machines where binary sources are not available can be substantially longer (~30 minutes). Installing only hidgenclassifier without the suggested packages takes about 1 minute.

Demo

After installation, a vignette illustrating an analysis of the publicly available MSK-IMPACT dataset (contained in the package) can be accessed by entering the following in the R console:

vignette("impact_anlaysis", package = "hidgenclassifier")

A rendered copy of the vignette from this repository can be found here.

Interactive html Version of Figure 1 in the Article

An interactive html version of Figure 1 in the article is included in this repository (inside figures). A rendered copy of the html figure is available here. If the rendered images do not load fully, we recommend downloading the html file from figures and then opening the file in a web browser.