Skip to content

Commit c4684f3

Browse files
jimb3claude
andcommitted
Restore correct README: remove Travis badge and htslib section
Re-sync README.Rmd with the state of README.md from commit 353cb4d, which had been edited directly without updating the .Rmd source. Removes the Travis CI badge, removes the outdated htslib Important News block, updates the format count to 5, marks Format 5 as recommended, and reorganises the functions list into Format 5 and legacy sections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c621055 commit c4684f3

File tree

3 files changed

+54
-119
lines changed

3 files changed

+54
-119
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: BinaryDosage
22
Title: Creates, Merges, and Reads Binary Dosage Files
3-
Version: 1.0.0.9021
3+
Version: 1.0.0.9022
44
Authors@R:
55
c(person(given = "John",
66
family = "Morrison",

README.Rmd

Lines changed: 21 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -13,42 +13,11 @@ knitr::opts_chunk$set(
1313

1414
<!-- badges: start -->
1515
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/USCbiostats/BinaryDosage?branch=master&svg=true)](https://ci.appveyor.com/project/USCbiostats/BinaryDosage)
16-
[![Travis build status](https://travis-ci.org/USCbiostats/BinaryDosage.svg?branch=master)](https://app.travis-ci.com/USCbiostats/BinaryDosage)
1716
[![Codecov test coverage](https://codecov.io/gh/USCbiostats/BinaryDosage/branch/master/graph/badge.svg)](https://app.codecov.io/gh/USCbiostats/BinaryDosage?branch=master)
1817
<!-- badges: end -->
1918

2019
# Binary Dosage Files
2120

22-
### Important News
23-
24-
A new version of BinaryDosage has been developed that significantly reduces data read times by a factor of more than 10 times. This new version uses the hstlib libraries which greatly improves the read speed of VCF files. To compile this new version requires the installation of the [Rhtslib](https://bioconductor.org/packages/release/bioc/html/Rhtslib.html) library from Bioconductor.
25-
26-
Data compression of the BinaryDosage formatted files has also been improved. We have had reports that the BinaryDosage formatted files were over 3 times larger than the gzipped VCF file. This was due to the compression routine not compressing SNPs with low minor allele frequencies (<0.01) well. When BinaryDosage was first written, imputation servers did not include many rare SNPs. This has changed since BinaryDosage was first written.
27-
28-
To install the latest version of BinaryDosage, it is recommended the user have R 4.3.x or higher. If the user is using Windows, they will need to verify that the current version of [R tools](https://cran.r-project.org/bin/windows/Rtools/) is installed. If the user is using Linux or Mac OS X, the zlib development tools need to be installed, often named zlib1g-dev. For most systems, these tools are usually already loaded.
29-
30-
The package [Rhtslib](https://bioconductor.org/packages/release/bioc/html/Rhtslib.html) from BioConductor needs to be installed using the following code.
31-
``` {r, eval = F}
32-
if (!require("BiocManager", quietly = TRUE))
33-
install.packages("BiocManager")
34-
35-
BiocManager::install("Rhtslib")
36-
```
37-
38-
Once the preceding prerequisites are met the follow code will install the latest version of BinaryDosage.
39-
``` {r, eval = F}
40-
remove.packages("BinaryDosage")
41-
devtools::install_github("https://github.com/USCbiostats/BinaryDosage@htslib")
42-
43-
library(BinaryDosage)
44-
```
45-
46-
#### Important
47-
48-
All BinaryDosage formatted files created with older versions are fully compatible with this new version of BinaryDosage. [GxEScanR](https://github.com/USCbiostats/GxEScanR) works with files created by all versions of BinaryDosage, including this new one.
49-
50-
The information below is for the current release version of BinaryDosage. Visit the [htslib branch](https://github.com/USCbiostats/BinaryDosage/tree/htslib) or BinaryDosage for more information about the new version.
51-
5221
### Introduction
5322

5423
Genotype imputation is an essential tool in genomics, enabling association testing with markers not directly genotyped, increasing statistical power, and facilitating data pooling between studies that employ different genotyping platforms. Two commonly used software packages for imputation are [minimac](https://genome.sph.umich.edu/wiki/Minimac) and [Impute2](http://mathgen.stats.ox.ac.uk/impute/impute_v2.html). Furthermore, services such as the [Michigan Imputation Server](https://imputationserver.sph.umich.edu/index.html) have made genotype imputation much more accessible and streamlined.
@@ -75,9 +44,9 @@ For GWAS/GWIS analysis of BinaryDosage files, please refer to the [**GxEScanR**]
7544
+ Dosage values
7645
+ Genotype probabilities, Pr(*g=0*), Pr(*g=1*), Pr(*g=2*)
7746

78-
There are 4 formats for a binary dosage data set. Data sets in formats 1, 2, and 3 have 3 files, a sample information file, a SNP information file, and a genetic information file. Data sets in format 4 have just 1 file. This file contains all the information listed above and may contain the following information.
47+
There are 5 formats for a binary dosage data set. Data sets in formats 1, 2, and 3 have 3 files, a sample information file, a SNP information file, and a genetic information file. Data sets in format 4 have just 1 file. Format 5 uses per-SNP gzip compression and stores metadata in a companion RDS file (`.bdinfo`). This file contains all the information listed above and may contain the following information.
7948

80-
**Note:** Format 4 is recommended and is the default value for all functions.
49+
**Note:** Format 5 is the recommended format for new data sets.
8150

8251
- Additional SNP information
8352
+ Alternate allele frequency
@@ -89,17 +58,28 @@ There are 4 formats for a binary dosage data set. Data sets in formats 1, 2, and
8958
+ Sample size of each data set merged
9059

9160
### Functions
92-
- **vcftobd** - Converts a VCF file to a Format 5 binary dosage data set
93-
- **vcftobdlegacy** - Converts a VCF file to a legacy format (1-4) binary dosage data set
61+
62+
#### Format 5 (recommended)
63+
64+
- **vcftobd** - Converts a bgzipped VCF file to a Format 5 binary dosage data set (requires vcfppR)
65+
- **getbd5info** - Loads a Format 5 file pair and returns an R list (required for **getsnp**, **bdapply**, and **mergebd**)
66+
- **getbd5snp** - Reads a single SNP from a Format 5 file by index or ID
67+
- **updatebd** - Converts a legacy format (1–4) binary dosage file to Format 5
68+
- **subsetbd** - Creates a new Format 5 file containing a subset of SNPs and/or subjects from any binary dosage file (formats 1–5)
69+
- **mergebd** - Merges two or more Format 5 files into a single Format 5 file
70+
71+
#### Legacy formats (1–4)
72+
73+
- **vcftobdlegacy** - Converts a VCF file to a legacy format (1–4) binary dosage data set (deprecated; use **vcftobd** instead)
9474
- **gentobd** - Converts a GEN (impute2) file to a binary dosage data set
95-
- **bdmerge** - Merges multiple binary dosage data sets into a single data set
96-
- **getbdinfo** - Creates an R List containing information about a binary dosage data set (required for **getsnp** and **bdapply**)
97-
- **getvcfinfo** - Creates an R List containing information about a VCF file (required for **vcfapply**)
98-
- **getgeninfo** - Creates an R List containing information about a GEN file (required for **genapply**)
99-
- **bdapply** - Applies a function to the data for each SNP in a binary dosage file (requires list returned by **getbdinfo**)
75+
- **bdmerge** - Merges multiple legacy binary dosage data sets into a single data set
76+
- **getbdinfo** - Creates an R list containing information about a binary dosage data set (required for **getsnp** and **bdapply**)
77+
- **getvcfinfo** - Creates an R list containing information about a VCF file (required for **vcfapply**)
78+
- **getgeninfo** - Creates an R list containing information about a GEN file (required for **genapply**)
79+
- **bdapply** - Applies a function to the data for each SNP in a binary dosage file (requires list returned by **getbdinfo** or **getbd5info**)
10080
- **vcfapply** - Applies a function to the data for each SNP in a VCF file (requires list returned by **getvcfinfo**)
10181
- **genapply** - Applies a function to the data for each SNP in a GEN file (requires list returned by **getgeninfo**)
102-
- **getsnp** - Obtain genotype Dosages/Genotype Probabilities from a binary dosage file, outputs results to an R list
82+
- **getsnp** - Returns dosage and genotype probabilities for a single SNP from a binary dosage file
10383

10484
# Installation
10585

README.md

Lines changed: 32 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -5,73 +5,12 @@ BinaryDosage: Creates, Merges, and Reads Binary Dosage Files
55

66
[![AppVeyor build
77
status](https://ci.appveyor.com/api/projects/status/github/USCbiostats/BinaryDosage?branch=master&svg=true)](https://ci.appveyor.com/project/USCbiostats/BinaryDosage)
8-
[![Travis build
9-
status](https://travis-ci.org/USCbiostats/BinaryDosage.svg?branch=master)](https://app.travis-ci.com/USCbiostats/BinaryDosage)
108
[![Codecov test
119
coverage](https://codecov.io/gh/USCbiostats/BinaryDosage/branch/master/graph/badge.svg)](https://app.codecov.io/gh/USCbiostats/BinaryDosage?branch=master)
1210
<!-- badges: end -->
1311

1412
# Binary Dosage Files
1513

16-
### Important News
17-
18-
A new version of BinaryDosage has been developed that significantly
19-
reduces data read times by a factor of more than 10 times. This new
20-
version uses the hstlib libraries which greatly improves the read speed
21-
of VCF files. To compile this new version requires the installation of
22-
the
23-
[Rhtslib](https://bioconductor.org/packages/release/bioc/html/Rhtslib.html)
24-
library from Bioconductor.
25-
26-
Data compression of the BinaryDosage formatted files has also been
27-
improved. We have had reports that the BinaryDosage formatted files were
28-
over 3 times larger than the gzipped VCF file. This was due to the
29-
compression routine not compressing SNPs with low minor allele
30-
frequencies (\<0.01) well. When BinaryDosage was first written,
31-
imputation servers did not include many rare SNPs. This has changed
32-
since BinaryDosage was first written.
33-
34-
To install the latest version of BinaryDosage, it is recommended the
35-
user have R 4.3.x or higher. If the user is using Windows, they will
36-
need to verify that the current version of [R
37-
tools](https://cran.r-project.org/bin/windows/Rtools/) is installed. If
38-
the user is using Linux or Mac OS X, the zlib development tools need to
39-
be installed, often named zlib1g-dev. For most systems, these tools are
40-
usually already loaded.
41-
42-
The package
43-
[Rhtslib](https://bioconductor.org/packages/release/bioc/html/Rhtslib.html)
44-
from BioConductor needs to be installed using the following code.
45-
46-
``` r
47-
if (!require("BiocManager", quietly = TRUE))
48-
install.packages("BiocManager")
49-
50-
BiocManager::install("Rhtslib")
51-
```
52-
53-
Once the preceding prerequisites are met the follow code will install
54-
the latest version of BinaryDosage.
55-
56-
``` r
57-
remove.packages("BinaryDosage")
58-
devtools::install_github("https://github.com/USCbiostats/BinaryDosage@htslib")
59-
60-
library(BinaryDosage)
61-
```
62-
63-
#### Important
64-
65-
All BinaryDosage formatted files created with older versions are fully
66-
compatible with this new version of BinaryDosage.
67-
[GxEScanR](https://github.com/USCbiostats/GxEScanR) works with files
68-
created by all versions of BinaryDosage, including this new one.
69-
70-
The information below is for the current release version of
71-
BinaryDosage. Visit the [htslib
72-
branch](https://github.com/USCbiostats/BinaryDosage/tree/htslib) or
73-
BinaryDosage for more information about the new version.
74-
7514
### Introduction
7615

7716
Genotype imputation is an essential tool in genomics, enabling
@@ -118,14 +57,14 @@ For GWAS/GWIS analysis of BinaryDosage files, please refer to the
11857
- Dosage values
11958
- Genotype probabilities, Pr(*g=0*), Pr(*g=1*), Pr(*g=2*)
12059

121-
There are 4 formats for a binary dosage data set. Data sets in formats
60+
There are 5 formats for a binary dosage data set. Data sets in formats
12261
1, 2, and 3 have 3 files, a sample information file, a SNP information
12362
file, and a genetic information file. Data sets in format 4 have just 1
124-
file. This file contains all the information listed above and may
125-
contain the following information.
63+
file. Format 5 uses per-SNP gzip compression and stores metadata in a
64+
companion RDS file (`.bdinfo`). This file contains all the information
65+
listed above and may contain the following information.
12666

127-
**Note:** Format 4 is recommended and is the default value for all
128-
functions.
67+
**Note:** Format 5 is the recommended format for new data sets.
12968

13069
- Additional SNP information
13170
- Alternate allele frequency
@@ -138,27 +77,43 @@ functions.
13877

13978
### Functions
14079

141-
- **vcftobd** - Converts a VCF file to a Format 5 binary dosage data set
142-
- **vcftobdlegacy** - Converts a VCF file to a legacy format (1-4)
143-
binary dosage data set
80+
#### Format 5 (recommended)
81+
82+
- **vcftobd** - Converts a bgzipped VCF file to a Format 5 binary dosage
83+
data set (requires vcfppR)
84+
- **getbd5info** - Loads a Format 5 file pair and returns an R list
85+
(required for **getsnp**, **bdapply**, and **mergebd**)
86+
- **getbd5snp** - Reads a single SNP from a Format 5 file by index or ID
87+
- **updatebd** - Converts a legacy format (1–4) binary dosage file to
88+
Format 5
89+
- **subsetbd** - Creates a new Format 5 file containing a subset of SNPs
90+
and/or subjects from any binary dosage file (formats 1–5)
91+
- **mergebd** - Merges two or more Format 5 files into a single Format 5
92+
file
93+
94+
#### Legacy formats (1–4)
95+
96+
- **vcftobdlegacy** - Converts a VCF file to a legacy format (1–4)
97+
binary dosage data set (deprecated; use **vcftobd** instead)
14498
- **gentobd** - Converts a GEN (impute2) file to a binary dosage data
14599
set
146-
- **bdmerge** - Merges multiple binary dosage data sets into a single
147-
data set
148-
- **getbdinfo** - Creates an R List containing information about a
100+
- **bdmerge** - Merges multiple legacy binary dosage data sets into a
101+
single data set
102+
- **getbdinfo** - Creates an R list containing information about a
149103
binary dosage data set (required for **getsnp** and **bdapply**)
150-
- **getvcfinfo** - Creates an R List containing information about a VCF
104+
- **getvcfinfo** - Creates an R list containing information about a VCF
151105
file (required for **vcfapply**)
152-
- **getgeninfo** - Creates an R List containing information about a GEN
106+
- **getgeninfo** - Creates an R list containing information about a GEN
153107
file (required for **genapply**)
154108
- **bdapply** - Applies a function to the data for each SNP in a binary
155-
dosage file (requires list returned by **getbdinfo**)
109+
dosage file (requires list returned by **getbdinfo** or
110+
**getbd5info**)
156111
- **vcfapply** - Applies a function to the data for each SNP in a VCF
157112
file (requires list returned by **getvcfinfo**)
158113
- **genapply** - Applies a function to the data for each SNP in a GEN
159114
file (requires list returned by **getgeninfo**)
160-
- **getsnp** - Obtain genotype Dosages/Genotype Probabilities from a
161-
binary dosage file, outputs results to an R list
115+
- **getsnp** - Returns dosage and genotype probabilities for a single
116+
SNP from a binary dosage file
162117

163118
# Installation
164119

0 commit comments

Comments
 (0)