Add mergebd vignette; mark bdmerge as legacy

jimb3 · claude · jimb3 · commit d64a4884fe03 · 2026-04-02T11:36:21.000-07:00
Add vignettes/mergingbd5files.Rmd covering mergebd with subject-merge and
SNP-merge examples. Add legacy notice to mergingfiles.Rmd linking to the
new vignette.

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: BinaryDosage
 Title: Creates, Merges, and Reads Binary Dosage Files
-Version: 1.0.0.9026
+Version: 1.0.0.9027
 Authors@R:
     c(person(given = "John",
              family = "Morrison",
diff --git a/vignettes/mergingbd5files.Rmd b/vignettes/mergingbd5files.Rmd
@@ -0,0 +1,167 @@
+---
+title: "Merging Format 5 Binary Dosage Files"
+output:
+  rmarkdown::html_vignette:
+    toc: true
+vignette: >
+  %\VignetteIndexEntry{Merging Format 5 Binary Dosage Files}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{r setup, echo = FALSE}
+library(BinaryDosage)
+```
+
+# Introduction
+
+The `mergebd` function merges two or more Format 5 binary dosage files into a
+single Format 5 output file. The merge type is detected automatically from the
+input files.
+
+- **Subject merge** — subject IDs do not overlap across files. The output
+  contains all subjects from every input file and the SNPs common to all files.
+- **SNP merge** — SNP IDs do not overlap across files. The output contains all
+  SNPs from every input file and the subjects common to all files.
+
+If both subject IDs and SNP IDs overlap across files an error is returned,
+since the merge type is ambiguous.
+
+SNPs are identified by chromosome, position, reference allele, and alternate
+allele, regardless of the SNP ID format stored in each file.
+
+The function takes the following parameters.
+
+- `bdose_files` — character vector of paths to the input `.bdose` files (at
+  least two). The companion `.bdi` file for each is expected at
+  `paste0(bdose_files[i], ".bdi")`.
+- `bdose_file` — path for the output `.bdose` file. The companion `.bdi` file
+  is written automatically to `paste0(bdose_file, ".bdi")`.
+
+# Setup
+
+The examples below use the bgzipped VCF file included with the package,
+*set1a.vcf.gz*, which contains data for 60 subjects and 10 SNPs on chromosome
+1. All output files are written to a temporary directory.
+
+```{r setup_files, message = FALSE, warning = FALSE}
+vcf_file   <- system.file("extdata", "set1a.vcf.gz", package = "BinaryDosage")
+bdose_full <- file.path(tempdir(), "full.bdose")
+
+vcftobd(vcffile = vcf_file, bdose_file = bdose_full)
+bd_full <- getbdinfo(bdose_full)
+
+cat("Subjects:", nrow(bd_full$samples), "\n")
+cat("SNPs:    ", nrow(bd_full$snps),    "\n")
+```
+
+# Subject merge
+
+A subject merge combines files that cover different subjects but the same (or
+overlapping) set of SNPs. The output contains all subjects and the SNPs common
+to every input file.
+
+The example splits the 60-subject file into two 30-subject files using
+`subsetbd`, then merges them back together.
+
+```{r subject_merge, message = FALSE, warning = FALSE}
+bdose_a   <- file.path(tempdir(), "set_a.bdose")
+bdose_b   <- file.path(tempdir(), "set_b.bdose")
+bdose_out <- file.path(tempdir(), "merged_subjects.bdose")
+
+sids <- bd_full$samples$sid
+
+subsetbd(bdfiles    = bdose_full,
+         bdose_file = bdose_a,
+         subjectids = sids[1:30])
+
+subsetbd(bdfiles    = bdose_full,
+         bdose_file = bdose_b,
+         subjectids = sids[31:60])
+
+mergebd(bdose_files = c(bdose_a, bdose_b),
+        bdose_file  = bdose_out)
+
+bd_a   <- getbdinfo(bdose_a)
+bd_b   <- getbdinfo(bdose_b)
+bd_out <- getbdinfo(bdose_out)
+
+cat("File A subjects:", nrow(bd_a$samples),   "\n")
+cat("File B subjects:", nrow(bd_b$samples),   "\n")
+cat("Merged subjects:", nrow(bd_out$samples), "\n")
+cat("Merged SNPs:    ", nrow(bd_out$snps),    "\n")
+```
+
+The merged file contains all 60 subjects and all 10 SNPs.
+
+## Verifying subject order
+
+The subjects in the merged file appear in input-file order: all subjects from
+the first file followed by all subjects from the second file.
+
+```{r subject_order}
+knitr::kable(bd_out$samples, caption = "Subjects in merged file")
+```
+
+# SNP merge
+
+A SNP merge combines files that cover different SNPs but the same (or
+overlapping) set of subjects. The output contains all SNPs and the subjects
+common to every input file.
+
+The example splits the 10-SNP file into two 5-SNP files using `subsetbd`, then
+merges them back together.
+
+```{r snp_merge, message = FALSE, warning = FALSE}
+bdose_snp_a   <- file.path(tempdir(), "snp_a.bdose")
+bdose_snp_b   <- file.path(tempdir(), "snp_b.bdose")
+bdose_snp_out <- file.path(tempdir(), "merged_snps.bdose")
+
+locs <- bd_full$snps$location
+
+subsetbd(bdfiles    = bdose_full,
+         bdose_file = bdose_snp_a,
+         locations  = locs[1:5])
+
+subsetbd(bdfiles    = bdose_full,
+         bdose_file = bdose_snp_b,
+         locations  = locs[6:10])
+
+mergebd(bdose_files = c(bdose_snp_a, bdose_snp_b),
+        bdose_file  = bdose_snp_out)
+
+bd_snp_a   <- getbdinfo(bdose_snp_a)
+bd_snp_b   <- getbdinfo(bdose_snp_b)
+bd_snp_out <- getbdinfo(bdose_snp_out)
+
+cat("File A SNPs:    ", nrow(bd_snp_a$snps),   "\n")
+cat("File B SNPs:    ", nrow(bd_snp_b$snps),   "\n")
+cat("Merged SNPs:    ", nrow(bd_snp_out$snps),  "\n")
+cat("Merged subjects:", nrow(bd_snp_out$samples), "\n")
+```
+
+## Verifying SNP order
+
+SNPs appear in input-file order: all SNPs from the first file followed by all
+SNPs from the second file.
+
+```{r snp_order}
+knitr::kable(bd_snp_out$snps, caption = "SNPs in merged file")
+```
+
+```{r cleanup, include = FALSE}
+unlink(c(bdose_full,    paste0(bdose_full,    ".bdi"),
+         bdose_a,       paste0(bdose_a,       ".bdi"),
+         bdose_b,       paste0(bdose_b,       ".bdi"),
+         bdose_out,     paste0(bdose_out,     ".bdi"),
+         bdose_snp_a,   paste0(bdose_snp_a,   ".bdi"),
+         bdose_snp_b,   paste0(bdose_snp_b,   ".bdi"),
+         bdose_snp_out, paste0(bdose_snp_out, ".bdi")))
+```
diff --git a/vignettes/mergingfiles.Rmd b/vignettes/mergingfiles.Rmd
@@ -18,6 +18,13 @@ knitr::opts_chunk$set(
 library(BinaryDosage)
 ```
 
+**Note:** `bdmerge` is a legacy function that operates on formats 1–4 only.
+For Format 5 files, use `mergebd` instead. See the
+[Merging Format 5 Binary Dosage Files](mergingbd5files.html) vignette for
+details.
+
+---
+
 Quite often subjects have their genotypes imputed in batches. The files returned by these imputation can be converted into binary dosage files. These binary files can be merged into a single file if they have the same SNPs and different subjects using the bdmerge routine.
 
 ## bdmerge