Skip to content

Commit 0d532c3

Browse files
jimb3claude
andcommitted
Update vignettes: legacy notice in bdformats, faster SNP loops in format5
Mark formats 1-4 as legacy in bdformats.Rmd and link to the Format 5 vignette. Replace the getbd5snp loop in usingformat5files.Rmd with bdapply and a getbd5snp_con example, and link to bd5snpreaders for details. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 6936144 commit 0d532c3

File tree

3 files changed

+45
-13
lines changed

3 files changed

+45
-13
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: BinaryDosage
22
Title: Creates, Merges, and Reads Binary Dosage Files
3-
Version: 1.0.0.9025
3+
Version: 1.0.0.9026
44
Authors@R:
55
c(person(given = "John",
66
family = "Morrison",

vignettes/bdformats.Rmd

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,16 @@ knitr::opts_chunk$set(
1414
)
1515
```
1616

17-
There are currently 4 formats for the binary dosage file.
17+
**Note:** This vignette describes formats 1 through 4, which are legacy formats.
18+
Format 5 is the recommended format for new work. It uses per-SNP gzip
19+
compression and stores metadata in a companion RDS file, making it simpler and
20+
more efficient to use. See the
21+
[Using Format 5 Binary Dosage Files](usingformat5files.html) vignette for
22+
details.
23+
24+
---
25+
26+
There are currently 4 legacy formats for the binary dosage file.
1827

1928
The first three formats consist of three files, binary dosage, family, and map. The family and maps are data frames with information about the subjects and SNPs in the binary dosage file, respectively. These are data frames saved with the <span style="font-family:Courier">saveRDS</span> command.
2029

vignettes/usingformat5files.Rmd

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -223,23 +223,46 @@ knitr::kable(head(snp3_df, 10),
223223

224224
# Applying a function to all SNPs
225225

226-
There is no dedicated `bd5apply` function yet, but the same pattern used by
227-
`bdapply` can be reproduced with a simple loop. The following example
228-
calculates the alternate allele frequency for every SNP.
226+
The simplest option is `bdapply`, which handles the loop internally and uses
227+
`getbd5snp_con` automatically for Format 5 files.
229228

230-
```{r apply, eval = TRUE, echo = T, message = F, warning = F}
231-
n_snps <- nrow(bd5$snps)
232-
aaf <- numeric(n_snps)
229+
```{r apply_bdapply, eval = TRUE, echo = T, message = F, warning = F}
230+
getaaf <- function(dosage, p0, p1, p2) mean(dosage, na.rm = TRUE) / 2
231+
232+
aaf_list <- bdapply(bdinfo = bd5, func = getaaf)
233+
aaf_table <- data.frame(snpid = bd5$snps$snpid,
234+
aaf = round(unlist(aaf_list), 4))
235+
236+
knitr::kable(aaf_table, caption = "Alternate allele frequencies via bdapply")
237+
```
233238

239+
For manual loops, `getbd5snp_buf` and `getbd5snp_con` are faster than
240+
`getbd5snp` because they avoid allocating a new list on every call.
241+
`getbd5snp_con` is the fastest option as it also keeps the file open across
242+
all iterations. See the
243+
[Reading SNPs from Format 5 Files](bd5snpreaders.html) vignette for details
244+
and a performance comparison.
245+
246+
```{r apply_con, eval = TRUE, echo = T, message = F, warning = F}
247+
n_snps <- nrow(bd5$snps)
248+
n_samp <- nrow(bd5$samples)
249+
dosage <- numeric(n_samp)
250+
p0 <- numeric(n_samp)
251+
p1 <- numeric(n_samp)
252+
p2 <- numeric(n_samp)
253+
254+
con <- openbd5con(bd5)
255+
aaf <- numeric(n_snps)
234256
for (i in seq_len(n_snps)) {
235-
dosage <- getbd5snp(bd5info = bd5, snp = i)$dosage
257+
getbd5snp_con(bd5info = bd5, snp = i,
258+
dosage = dosage, p0 = p0, p1 = p1, p2 = p2,
259+
bd5con = con)
236260
aaf[i] <- mean(dosage, na.rm = TRUE) / 2
237261
}
262+
closebd5con(con)
238263
239-
aaf_table <- data.frame(snpid = bd5$snps$snpid,
240-
aaf = round(aaf, 4))
241-
242-
knitr::kable(aaf_table, caption = "Alternate allele frequencies")
264+
knitr::kable(data.frame(snpid = bd5$snps$snpid, aaf = round(aaf, 4)),
265+
caption = "Alternate allele frequencies via getbd5snp_con")
243266
```
244267

245268
```{r cleanup, include = FALSE, eval = TRUE}

0 commit comments

Comments
 (0)