Skip to content

Commit f703445

Browse files
authored
Merge branch 'dev' into post-map-lenfilter
2 parents fc571f0 + f3ca5fc commit f703445

7 files changed

Lines changed: 138 additions & 107 deletions

File tree

.github/workflows/ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ jobs:
112112
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --clip_readlength 0 --run_bam_filtering --bam_filter_minreadlength 50
113113
- name: DEDUPLICATION Test with dedup
114114
run: |
115-
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --dedupper 'dedup'
115+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --dedupper 'dedup' --dedup_all_merged
116116
- name: GENOTYPING_HC Test running GATK HaplotypeCaller
117117
run: |
118118
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker --run_genotyping --genotyping_tool 'hc' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_hc_emitrefconf 'BP_RESOLUTION'
@@ -131,7 +131,7 @@ jobs:
131131
- name: TRIMBAM Test bamutils works alone
132132
run: |
133133
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_trim_bam
134-
- name: TRIMBAM Test PMDtools works alone
134+
- name: PMDTOOLS Test PMDtools works alone
135135
run: |
136136
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_pmdtools
137137
- name: GATK 3.5 Download resource files

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
2828
* Nuclear contamination results are now shown in the MultiQC report.
2929
* Tutorial on how to use profiles for reproducible science (i.e. parameter sharing between different groups)
3030
* [#522](https://github.com/nf-core/eager/issues/522) Added post-mapping length filter to asisst in more realistic endogenous DNA calculations
31+
* [#512](https://github.com/nf-core/eager/issues/512) Added flexible trimming of bams by library type. 'half' and 'none' UDG libraries can now be trimmed differentially within a single eager run.
3132

3233
### `Fixed`
3334

@@ -42,11 +43,14 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
4243
* [#473](https://github.com/nf-core/eager/issues/473) - Fixed bug in sexdet_process on AWS
4344
* [#444](https://github.com/nf-core/eager/issues/444) - Provide option for preserving realigned bam + index
4445
* Increase MultiQC process memory requirements to ensure enough memory for large runs
46+
* Fixed deduplication output logic. Will now pass along only the post-rmdup bams if duplicate removal is not skipped, instead of both the post-rmdup and pre-rmdup bams.
4547
* [#497](https://github.com/nf-core/eager/issues/497) - Simplifies number of parameters required to run bam filtering
4648
* [#501](https://github.com/nf-core/eager/issues/501) - Adds additional validation checks for MALT/MaltExtract database input files
4749
* [#508](https://github.com/nf-core/eager/issues/508) - Made Markduplicates default dedupper due to narrower context specificity of dedup
4850
* [#516](https://github.com/nf-core/eager/issues/516) - Made bedtools not report out of memory exit code when warning of inconsistant FASTA/Bed entry names
4951
* [#504](https://github.com/nf-core/eager/issues/504) - Removed uninformative sexdeterrmine-snps plot from MultiQC report.
52+
* Nuclear contamination is now reported with the correct library names.
53+
5054

5155
### `Dependencies`
5256

bin/print_x_contamination.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ def make_float(x):
3737
Ind=re.sub('\.X.contamination.out$', '', fn).split("/")[-1]
3838
for line in f:
3939
fields=line.strip().split()
40-
if line.strip()[0:19] == "We have nSNP sites:":
41-
nSNPs=fields[4][:-1]
40+
if line.strip()[0:21] == "[readicnts] Has read:":
41+
nSNPs=fields[4]
4242
elif line.strip()[0:7] == "Method1" and line.strip()[9:16] == 'new_llh':
4343
mom1=fields[3].split(":")[1]
4444
err_mom1=fields[4].split(":")[1]

docs/output.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -672,7 +672,7 @@ Each module has it's own output directory which sit alongside the `MultiQC/` dir
672672
- `qualimap/` - this contains a sub-directory for every sample, which includes a qualimap report and associated raw statistic files. You can open the `.html` file in your internet browser to see the in-depth report (this will be more detailed than in MultiQC). This includes stuff like percent coverage, depth coverage, GC content and so on of your mapped reads.
673673
- `damageprofiler/` - this contains sample specific directories containing raw statistics and damage plots from DamageProfiler. The `.pdf` files can be used to visualise C to T miscoding lesions or read length distributions of your mapped reads. All raw statistics used for the PDF plots are contained in the `.txt` files.
674674
- `pmdtools/` this contains raw output statistics of pmdtools (estimates of frequencies of substitutions), and BAM files which have been filtered to remove reads that do not have a Post-mortem damage (PMD) score of `--pmdtools_threshold`. The BAM files do not have corresponding index files.
675-
- `trimmed_bam/` this contains the BAM files with X number of bases trimmed off as defined with the `--bamutils_clip_left` and `--bamutils_clip_right` flags and corresponding index files. You can use these BAM files for downstream analysis such as re-mapping data with more stringent parameters (if you set trimming to remove the most likely places containing damage in the read).
675+
- `trimmed_bam/` this contains the BAM files with X number of bases trimmed off as defined with the `--bamutils_clip_half_udg_left`, `--bamutils_clip_half_udg_right`, `--bamutils_clip_none_udg_left`, and `--bamutils_clip_none_udg_right` flags and corresponding index files. You can use these BAM files for downstream analysis such as re-mapping data with more stringent parameters (if you set trimming to remove the most likely places containing damage in the read).
676676
- `genotyping/` this contains all the (gzipped) genotyping files produced by your genotyping module. The file suffix will have the genotyping tool name. You will have files corresponding to each of your deduplicated BAM files (except pileupcaller), or any turned-on downstream processes that create BAMs (e.g. trimmed bams or pmd tools). If `--gatk_ug_keep_realign_bam` supplied, this may also contain BAM files from InDel realignment when using GATK 3 and UnifiedGenotyping for variant calling.
677677
- `MultiVCFAnalyzer/` this contains all output from MultiVCFAnalyzer, including SNP calling statistics, various SNP table(s) and FASTA alignment files.
678678
- `sex_determination/` this contains the output for the sex determination run. This is a single `.tsv` file that includes a table with the Sample Name, the Nr of Autosomal SNPs, Nr of SNPs on the X/Y chromosome, the Nr of reads mapping to the Autosomes, the Nr of reads mapping to the X/Y chromosome, the relative coverage on the X/Y chromosomes, and the standard error associated with the relative coverages. These measures are provided for each bam file, one row per bam. If the `sexdeterrmine_bedfile` option has not been provided, the error bars cannot be trusted, and runtime will be considerably longer.

docs/usage.md

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -590,7 +590,11 @@ Turns off quality based trimming at the 5p end of reads when any of the --trimns
590590
591591
#### `--mergedonly`
592592

593-
This flag means that only merged reads are sent downstream for analysis. Singletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded. You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality).
593+
Specify that only merged reads are sent downstream for analysis.
594+
595+
Singletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded.
596+
597+
You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality). It is highly recommended when using `--dedupper 'dedup'` (see below).
594598

595599
### Read Mapping Parameters
596600

@@ -715,11 +719,18 @@ If using TSV input, deduplication is performed library, i.e. after lane merging.
715719

716720
#### `--dedupper`
717721

718-
Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool 'dedup' ([Pelter et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered. This utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.
722+
Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool 'dedup' ([Pelter et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered.
723+
724+
This utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.
725+
726+
Note that if you run without the `--mergedonly` flag for AdapterRemoval, DeDup will
727+
likely fail. If you absolutely want to use both PE and SE data, you can supply the
728+
`--dedup_all_merged` flag to consider singletons to also be merged paired-end reads. This
729+
may result in over-zealous deduplication.
719730

720731
#### `--dedup_all_merged`
721732

722-
Sets DeDup to treat all reads as merged reads. This is useful if reads are for example not prefixed with `M_` in all cases.
733+
Sets DeDup to treat all reads as merged reads. This is useful if reads are for example not prefixed with `M_` in all cases. Therefore, this can be used as a workaround when also using a mixture of paired-end and single-end data, however this is not recommended (see above).
723734

724735
### Library Complexity Estimation Parameters
725736

@@ -784,13 +795,17 @@ More documentation can be seen in the [bamUtil documentation](https://genome.sph
784795

785796
Turns on the BAM trimming method. Trims off `[n]` bases from reads in the deduplicated BAM file Damage assessment in PMDTools or DamageProfiler remains untouched, as data is routed through this independently. BAM trimming os typically performed to reduce errors during genotyping that can be caused by aDNA damage.
786797

787-
BAM trimming will only be performed on libraries indicated as `--udg_type 'none'` or `--udg_type 'half'`. Complete UDG treatment ('full') should already have all damage removed. The amount of bases that will be trimmed off (see `--bamutils_clip_left` / `--bamutils_clip_right`) will be the same regardless whether `'none'` of `'half'`.
798+
BAM trimming will only be performed on libraries indicated as `--udg_type 'none'` or `--udg_type 'half'`. Complete UDG treatment ('full') should have removed all damage. The amount of bases that will be trimmed off can be set separately for libraries with `--udg_type` `'none'` and `'half'` (see `--bamutils_clip_half_udg_left` / `--bamutils_clip_half_udg_right` / `--bamutils_clip_none_udg_left` / `--bamutils_clip_none_udg_right`).
788799

789800
> Note: additional artefacts such as bar-codes or adapters that could potentially also be trimmed should be removed prior mapping.
790801
791-
#### `--bamutils_clip_left` / `--bamutils_clip_right`
802+
#### `--bamutils_clip_half_udg_left` / `--bamutils_clip_half_udg_right`
803+
804+
Default set to `1` and clips off one base of the left or right side of reads from libraries whose UDG treatment is set to `half`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).
805+
806+
#### `--bamutils_clip_none_udg_left` / `--bamutils_clip_none_udg_right`
792807

793-
Default set to `1` and clips off one base of the left or right side of reads. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).
808+
Default set to `1` and clips off one base of the left or right side of reads from libraries whose UDG treatment is set to `none`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).
794809

795810
#### `--bamutils_softclip`
796811

0 commit comments

Comments
 (0)