Commit a29e148
committed
Fix outer barcode matching: consensus from multiple reads, fuzzy matching, samplesheet-authoritative metrics
Three fixes for outer barcode (barcode_1/barcode_2) matching in splitcode_demux_fastqs:
1. Consensus barcode from multiple reads: Instead of trusting a single read
header to determine the FASTQ outer barcodes, read the first N reads
(default 10) and form a position-wise majority vote. This avoids failures
when the first read has a mismatched index sequence.
2. Fuzzy matching with configurable mismatch tolerance: DRAGEN demux tolerates
index mismatches, so FASTQ reads may carry barcodes that differ by 1-2 bases
from the samplesheet. The new barcode_matches_fuzzy() function counts non-N
mismatches and accepts matches within a threshold (default: 1). When multiple
samplesheet entries match, the one with fewest mismatches is preferred.
3. Samplesheet-authoritative barcodes in output: Picard-style metrics and
platform_unit values now use the samplesheet barcode values (the ground
truth) instead of FASTQ-observed values that may contain Ns or mismatches.
New CLI parameters: --max_barcode_mismatches, --num_reads_for_barcode1 parent 48ea324 commit a29e148
2 files changed
Lines changed: 588 additions & 78 deletions
0 commit comments