hidgenclassifier: An R package implementing methodologies described in "Mining Mutation Contexts across the Genome to Map Tumor Site of Origin" by Chakraborty et al. (Nat Commun 12, 3051 (2021), Link)
hidgenclassifier is an R package implementing Bayesian hierarchical hidden genome classifier for cancer sites developed in "Mining Mutation Contexts across the Genome to Map Tumor Site of Origin" by Chakraborty, Martin, Guan, Begg and Shen (2021; Link). It provides various pre-processing, fitting, and post-processing functions that collectively simplify handling of genomic datasets for use in the classifier, facilitate training of the hidden genome model, compute predicted cancer type probabilities of new tumors based on trained models, and aid rigorous quantification of predictor effects (via odds ratios) in fitted models. The repository also includes a detailed vignette exemplifying the hidden genome classification methodology through the hidgenclassifier package (rendered here), and an interactive html version of one of the figures (namely, Figure 1) displayed in the main manuscript.
- R:
Rpackage code. - data: filtered subsets of TCGA whole-exome and MSK-IMPACT targeted cancer gene panel sequencing datasets used in the analysis presented in the manuscript.
- man: package manual for help in R session.
- src: C++ source codes implementing various computation-heavy back-end functions.
- vignettes:
Rvignettes for R session html help pages. - figures: Interactive
.htmlversion of Figure 1 in the main manuscript.
The package hidgenclassifier can be run on a standard computer with 2 GB of RAM. For optimal performance we recommend a computer with specs:
RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core
The installation-times noted in the following are from a computer with the recommended specs (16 GB RAM, 4 cores@3.3 GHz) and internet of speed 100 Mbps.
The GitHub development version of hidgenclassifier has been tested on Linux and Windows operating systems as follows:
Linux: CentOS Linux release 7.8.2003 (Core)
Windows: Windows 10
The package hidgenclassifier depends on R v3.5.0 or newer. See the installation notes on the R project homepage for details on how to install the latest version of R.
hidgenclassifier contains source C++ codes, and thus requires the necessary C++ compilers to be pre-installed. This, for example, can be ensured in Windows computers by installing Rtools. See the CRAN manual on installing R packages for more details on installing source R packages on various platforms.
hidgenclassifier depends on a number of Bioconductor packages. To install these dependencies run the following commands in R:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(
c("SomaticSignatures",
"VariantAnnotation",
"IRanges",
"BSgenome.Hsapiens.UCSC.hg19")
)
Once the Bioconductor dependencies are all installed, the easiest way to install hidgenclassifier from GitHub is via R package devtools. Run the following commands in R to install devtools, if it is not already installed:
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
Then install hidgenclassifier as follows:
devtools::install_github("c7rishi/hidgenclassifier", build_vignettes = TRUE)
hidgenclassifier suggests a number of R packages (both on CRAN and on Bioconductor) for full functionality, and installing them all from scratch on a Windows/Mac computer using binary packages can take about 10 minutes. Install times on Linux machines where binary sources are not available can be substantially longer (~30 minutes). Installing only hidgenclassifier without the suggested packages takes about 1 minute.
After installation, a vignette illustrating an analysis of the publicly available MSK-IMPACT dataset (contained in the package) can be accessed by entering the following in the R console:
vignette("impact_anlaysis", package = "hidgenclassifier")
A rendered copy of the vignette from this repository can be found here.
An interactive html version of Figure 1 in the article is included in this repository (inside figures). A rendered copy of the html figure is available here. If the rendered images do not load fully, we recommend downloading the html file from figures and then opening the file in a web browser.