Skip to content

Commit c621055

Browse files
jimb3claude
andcommitted
Add build_vignettes install option to README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b6b80c4 commit c621055

File tree

3 files changed

+116
-39
lines changed

3 files changed

+116
-39
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: BinaryDosage
22
Title: Creates, Merges, and Reads Binary Dosage Files
3-
Version: 1.0.0.9020
3+
Version: 1.0.0.9021
44
Authors@R:
55
c(person(given = "John",
66
family = "Morrison",

README.Rmd

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ There are 4 formats for a binary dosage data set. Data sets in formats 1, 2, and
104104
# Installation
105105

106106
1. Install the [devtools](https://github.com/r-lib/devtools) package
107-
2. Install the [BinaryDosage](https://github.com/USCbiostats/BinaryDosage) package directly from the USCbiostats repository on GitHub:
107+
2. Install the [BinaryDosage](https://github.com/USCbiostats/BinaryDosage) package directly from the USCbiostats repository on GitHub:
108108

109109
``` {r, eval = F}
110110
remove.packages("BinaryDosage")
@@ -113,6 +113,14 @@ devtools::install_github("https://github.com/USCbiostats/BinaryDosage")
113113
library(BinaryDosage)
114114
```
115115

116+
To install the package with vignettes built, use the `build_vignettes` option:
117+
118+
``` {r, eval = F}
119+
devtools::install_github("https://github.com/USCbiostats/BinaryDosage", build_vignettes = TRUE)
120+
121+
library(BinaryDosage)
122+
```
123+
116124
# Usage
117125

118126
#### General Workflow

README.md

Lines changed: 106 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,73 @@ BinaryDosage: Creates, Merges, and Reads Binary Dosage Files
55

66
[![AppVeyor build
77
status](https://ci.appveyor.com/api/projects/status/github/USCbiostats/BinaryDosage?branch=master&svg=true)](https://ci.appveyor.com/project/USCbiostats/BinaryDosage)
8+
[![Travis build
9+
status](https://travis-ci.org/USCbiostats/BinaryDosage.svg?branch=master)](https://app.travis-ci.com/USCbiostats/BinaryDosage)
810
[![Codecov test
911
coverage](https://codecov.io/gh/USCbiostats/BinaryDosage/branch/master/graph/badge.svg)](https://app.codecov.io/gh/USCbiostats/BinaryDosage?branch=master)
1012
<!-- badges: end -->
1113

1214
# Binary Dosage Files
1315

16+
### Important News
17+
18+
A new version of BinaryDosage has been developed that significantly
19+
reduces data read times by a factor of more than 10 times. This new
20+
version uses the hstlib libraries which greatly improves the read speed
21+
of VCF files. To compile this new version requires the installation of
22+
the
23+
[Rhtslib](https://bioconductor.org/packages/release/bioc/html/Rhtslib.html)
24+
library from Bioconductor.
25+
26+
Data compression of the BinaryDosage formatted files has also been
27+
improved. We have had reports that the BinaryDosage formatted files were
28+
over 3 times larger than the gzipped VCF file. This was due to the
29+
compression routine not compressing SNPs with low minor allele
30+
frequencies (\<0.01) well. When BinaryDosage was first written,
31+
imputation servers did not include many rare SNPs. This has changed
32+
since BinaryDosage was first written.
33+
34+
To install the latest version of BinaryDosage, it is recommended the
35+
user have R 4.3.x or higher. If the user is using Windows, they will
36+
need to verify that the current version of [R
37+
tools](https://cran.r-project.org/bin/windows/Rtools/) is installed. If
38+
the user is using Linux or Mac OS X, the zlib development tools need to
39+
be installed, often named zlib1g-dev. For most systems, these tools are
40+
usually already loaded.
41+
42+
The package
43+
[Rhtslib](https://bioconductor.org/packages/release/bioc/html/Rhtslib.html)
44+
from BioConductor needs to be installed using the following code.
45+
46+
``` r
47+
if (!require("BiocManager", quietly = TRUE))
48+
install.packages("BiocManager")
49+
50+
BiocManager::install("Rhtslib")
51+
```
52+
53+
Once the preceding prerequisites are met the follow code will install
54+
the latest version of BinaryDosage.
55+
56+
``` r
57+
remove.packages("BinaryDosage")
58+
devtools::install_github("https://github.com/USCbiostats/BinaryDosage@htslib")
59+
60+
library(BinaryDosage)
61+
```
62+
63+
#### Important
64+
65+
All BinaryDosage formatted files created with older versions are fully
66+
compatible with this new version of BinaryDosage.
67+
[GxEScanR](https://github.com/USCbiostats/GxEScanR) works with files
68+
created by all versions of BinaryDosage, including this new one.
69+
70+
The information below is for the current release version of
71+
BinaryDosage. Visit the [htslib
72+
branch](https://github.com/USCbiostats/BinaryDosage/tree/htslib) or
73+
BinaryDosage for more information about the new version.
74+
1475
### Introduction
1576

1677
Genotype imputation is an essential tool in genomics, enabling
@@ -48,24 +109,23 @@ For GWAS/GWIS analysis of BinaryDosage files, please refer to the
48109
- Family ID
49110
- Subject ID
50111
- SNP information
51-
- Chromosome number
52-
- SNP ID
53-
- Location in base pairs
54-
- Reference allele
55-
- Alternate allele
112+
- Chromosome number\
113+
- SNP ID\
114+
- Location in base pairs\
115+
- Reference allele\
116+
- Alternate allele\
56117
- Genetic information
57118
- Dosage values
58119
- Genotype probabilities, Pr(*g=0*), Pr(*g=1*), Pr(*g=2*)
59120

60-
There are 5 formats for a binary dosage data set. Data sets in formats
121+
There are 4 formats for a binary dosage data set. Data sets in formats
61122
1, 2, and 3 have 3 files, a sample information file, a SNP information
62123
file, and a genetic information file. Data sets in format 4 have just 1
63-
file. Format 5 uses per-SNP gzip compression and stores metadata in a
64-
companion RDS file (`.bdose.bdi`), named by appending `.bdi` to the
65-
`.bdose` filename. This file contains all the information listed above
66-
and may contain the following information.
124+
file. This file contains all the information listed above and may
125+
contain the following information.
67126

68-
**Note:** Format 5 is the recommended format for new data sets.
127+
**Note:** Format 4 is recommended and is the default value for all
128+
functions.
69129

70130
- Additional SNP information
71131
- Alternate allele frequency
@@ -78,27 +138,27 @@ and may contain the following information.
78138

79139
### Functions
80140

81-
#### Format 5 (recommended)
82-
83-
- **vcftobd** - Converts a bgzipped VCF file to a Format 5 binary dosage data set (requires vcfppR)
84-
- **getbd5info** - Loads a Format 5 file pair and returns an R list (required for **getsnp**, **bdapply**, and **mergebd**)
85-
- **getbd5snp** - Reads a single SNP from a Format 5 file by index or ID
86-
- **updatebd** - Converts a legacy format (1–4) binary dosage file to Format 5
87-
- **subsetbd** - Creates a new Format 5 file containing a subset of SNPs and/or subjects from any binary dosage file (formats 1–5)
88-
- **mergebd** - Merges two or more Format 5 files into a single Format 5 file
89-
90-
#### Legacy formats (1–4)
91-
92-
- **vcftobdlegacy** - Converts a VCF file to a legacy format (1–4) binary dosage data set (deprecated; use **vcftobd** instead)
93-
- **gentobd** - Converts a GEN (impute2) file to a binary dosage data set
94-
- **bdmerge** - Merges multiple legacy binary dosage data sets into a single data set
95-
- **getbdinfo** - Creates an R list containing information about a binary dosage data set (required for **getsnp** and **bdapply**)
96-
- **getvcfinfo** - Creates an R list containing information about a VCF file (required for **vcfapply**)
97-
- **getgeninfo** - Creates an R list containing information about a GEN file (required for **genapply**)
98-
- **bdapply** - Applies a function to the data for each SNP in a binary dosage file (requires list returned by **getbdinfo** or **getbd5info**)
99-
- **vcfapply** - Applies a function to the data for each SNP in a VCF file (requires list returned by **getvcfinfo**)
100-
- **genapply** - Applies a function to the data for each SNP in a GEN file (requires list returned by **getgeninfo**)
101-
- **getsnp** - Returns dosage and genotype probabilities for a single SNP from a binary dosage file
141+
- **vcftobd** - Converts a VCF file to a Format 5 binary dosage data set
142+
- **vcftobdlegacy** - Converts a VCF file to a legacy format (1-4)
143+
binary dosage data set
144+
- **gentobd** - Converts a GEN (impute2) file to a binary dosage data
145+
set
146+
- **bdmerge** - Merges multiple binary dosage data sets into a single
147+
data set
148+
- **getbdinfo** - Creates an R List containing information about a
149+
binary dosage data set (required for **getsnp** and **bdapply**)
150+
- **getvcfinfo** - Creates an R List containing information about a VCF
151+
file (required for **vcfapply**)
152+
- **getgeninfo** - Creates an R List containing information about a GEN
153+
file (required for **genapply**)
154+
- **bdapply** - Applies a function to the data for each SNP in a binary
155+
dosage file (requires list returned by **getbdinfo**)
156+
- **vcfapply** - Applies a function to the data for each SNP in a VCF
157+
file (requires list returned by **getvcfinfo**)
158+
- **genapply** - Applies a function to the data for each SNP in a GEN
159+
file (requires list returned by **getgeninfo**)
160+
- **getsnp** - Obtain genotype Dosages/Genotype Probabilities from a
161+
binary dosage file, outputs results to an R list
102162

103163
# Installation
104164

@@ -114,6 +174,15 @@ devtools::install_github("https://github.com/USCbiostats/BinaryDosage")
114174
library(BinaryDosage)
115175
```
116176

177+
To install the package with vignettes built, use the `build_vignettes`
178+
option:
179+
180+
``` r
181+
devtools::install_github("https://github.com/USCbiostats/BinaryDosage", build_vignettes = TRUE)
182+
183+
library(BinaryDosage)
184+
```
185+
117186
# Usage
118187

119188
#### General Workflow
@@ -210,11 +279,11 @@ mergebd3 <- tempfile()
210279
Converting a VCF file into a binary dosage file is simple. The user
211280
passes the names of the VCF and information files along with the name
212281
for the binary dosage file to the
213-
<span style="font-family:Courier">vcftobdlegacy</span> function. There are
214-
some options available for the
215-
<span style="font-family:Courier">vcftobdlegacy</span> functions such as using
216-
gz compressed files vcf files. More information about these options can
217-
be found using the help files or reading the vignette
282+
<span style="font-family:Courier">vcftobdlegacy</span> function. There
283+
are some options available for the
284+
<span style="font-family:Courier">vcftobdlegacy</span> functions such as
285+
using gz compressed files vcf files. More information about these
286+
options can be found using the help files or reading the vignette
218287
<span style="font-family:Courier">usingvcffiles</span>.
219288

220289
The following commands convert VCF data sets 1a and 1b into the binary

0 commit comments

Comments
 (0)