Skip to content

Commit 9fdf426

Browse files
authored
Merge branch 'dev' into master
2 parents e6750c9 + f577ad1 commit 9fdf426

10 files changed

Lines changed: 231 additions & 169 deletions

File tree

.travis.yml

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,10 @@ script:
4242
- nf-core lint ${TRAVIS_BUILD_DIR}
4343
# Run the basic pipeline with the test profile
4444
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --saveReference
45+
# Test using PMD tools
46+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --run_pmdtools --pairedEnd
4547
# Run the basic pipeline with single end data (pretending its single end actually)
46-
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --singleEnd --bwa_index results/reference_genome/bwa_index/bwa_index/
48+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --singleEnd --bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
4749
# Run the basic pipeline with paired end data without collapsing
4850
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --skip_collapse --saveReference
4951
# Run the basic pipeline with paired end data without trimming
@@ -53,14 +55,19 @@ script:
5355
# Run the basic pipeline with output unmapped reads as fastq
5456
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --strip_input_fastq
5557
# Run the same pipeline testing optional step: fastp, complexity
56-
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --complexity_filter --bwa_index results/reference_genome/bwa_index/bwa_index/
58+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --complexity_filter --bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
5759
# Test BAM Trimming
58-
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --trim_bam --bwa_index results/reference_genome/bwa_index/bwa_index/
60+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --trim_bam --bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
5961
# Test running with CircularMapper
6062
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --circularmapper --circulartarget 'NC_007596.2'
6163
# Test running with BWA Mem
62-
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --bwamem --bwa_index results/reference_genome/bwa_index/bwa_index/
64+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --bwamem --bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
6365
# Test with zipped reference input
6466
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --fasta 'https://raw.githubusercontent.com/nf-core/test-datasets/eager2/reference/Test.fasta.gz'
6567
# Run the basic pipeline with the bam input profile
6668
- nextflow run ${TRAVIS_BUILD_DIR} -profile testbam,docker --bam
69+
# Run the basic pipeline with FastA reference with `fna` extension
70+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test_fna,docker --pairedEnd --saveReference
71+
# Test using pre-computed indices from a separate run beforehand
72+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test_fna,docker --pairedEnd --bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fna --fasta_index results/reference_genome/fasta_index/Mammoth_MT_Krause.fna.fai --seq_dict results/reference_genome/seq_dict/Mammoth_MT_Krause.dict
73+

CHANGELOG.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,19 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1212
* [#186](https://github.com/nf-core/eager/pull/186) - Make FastQC skipping [possible]
1313
/(https://github.com/nf-core/eager/issues/182)
1414
* Merged in [nf-core/tools](https://github.com/nf-core/tools) release V1.6 template changes
15+
* A lot more automated tests using Travis CI
16+
* Don't ignore DamageProfiler errors anymore
1517

1618
### `Fixed`
1719
* [#152](https://github.com/nf-core/eager/pull/152) - DamageProfiler errors [won't crash entire pipeline anymore](https://github.com/nf-core/eager/issues/171)
1820
* [#176](https://github.com/nf-core/eager/pull/176) - Increase runtime for DamageProfiler on [large reference genomes](https://github.com/nf-core/eager/issues/173)
1921
* [#172](https://github.com/nf-core/eager/pull/152) - DamageProfiler errors [won't crash entire pipeline anymore](https://github.com/nf-core/eager/issues/171)
20-
* [#174](https://github.com/nf-core/eager/pull/190) - Publish DeDup files [properly](https://github.com/nf-core/eager/issues/183)
22+
* [#174](https://github.com/nf-core/eager/pull/190) - Publish DeDup files [properly](https://github.com/nf-core/eager/issues/183)
23+
* [#196](https://github.com/nf-core/eager/pull/196) - Fix reference [issues](https://github.com/nf-core/eager/issues/150)
24+
* [#196](https://github.com/nf-core/eager/pull/196) - Fix issues with PE data being mapped incompletely
25+
* [#200](https://github.com/nf-core/eager/pull/200) - Fix minor issue with some [typos](https://github.com/nf-core/eager/pull/196)
26+
* [#210](https://github.com/nf-core/eager/pull/210) - Fix PMDTools [encoding issue](https://github.com/pontussk/PMDtools/issues/6) from `samtools calmd` generated files by running through `samtools view` first
27+
2128

2229
### `Dependencies`
2330

assets/multiqc_config.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,17 @@ top_modules:
66
- 'fastqc':
77
name: 'FastQC (pre-AdapterRemoval)'
88
path_filters:
9-
- '*_fastqc.zip'
10-
path_filters_exclude:
11-
- '*.combined.prefixed_fastqc.zip'
9+
- '*_raw_fastqc.zip'
1210
- 'fastp'
1311
- 'adapterRemoval'
1412
- 'fastqc':
1513
name: 'FastQC (post-AdapterRemoval)'
1614
path_filters:
15+
- '*.truncated_fastqc.zip'
1716
- '*.combined*_fastqc.zip'
1817
- 'samtools'
19-
- 'preseq'
2018
- 'dedup'
19+
- 'preseq'
2120
- 'qualimap'
2221
- 'damageprofiler'
2322
- 'gatk'

bin/extract_map_reads.py

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,14 @@ def _get_args():
1313
parser = argparse.ArgumentParser(
1414
prog='extract_mapped_reads',
1515
formatter_class=argparse.RawDescriptionHelpFormatter,
16-
description=f'''
17-
Remove mapped in bam file from fastq files
18-
''')
16+
description="Remove mapped in bam file from fastq files")
1917
parser.add_argument('bam_file', help="path to bam file")
2018
parser.add_argument('fwd', help='path to forward fastq file')
2119
parser.add_argument(
22-
'-2',
20+
'-rev',
2321
dest="rev",
2422
default=None,
25-
help="path to forward fastq file")
23+
help="path to reverse fastq file")
2624
parser.add_argument(
2725
'-of',
2826
dest="out_fwd",
@@ -89,6 +87,19 @@ def extract_mapped(bam, processes):
8987
chrs = bamfile.references
9088
except ValueError as e:
9189
print(e)
90+
91+
# Returns empty list if not reads mapped (because not ref match in bam)
92+
if len(chrs) == 0:
93+
return([])
94+
95+
# Checking that nb_process is not > nb_chromosomes
96+
elif len(chrs) < processes:
97+
print(
98+
f"""Requested {processes} processe(s),
99+
but can only be parallelized on {len(chrs)}
100+
processes with these data""")
101+
processes = len(chrs)
102+
92103
extract_mapped_chr_partial = partial(extract_mapped_chr, bam=bam)
93104
p = multiprocessing.Pool(processes)
94105
res = p.map(extract_mapped_chr_partial, chrs)
@@ -163,8 +174,8 @@ def write_fq(fq_dict, fname, mode):
163174
"""
164175
Write to fastq file
165176
INPUT:
166-
- fq_dict(dict) dictionary with unmapped read names as keys, seq and quality as values
167-
in a list
177+
- fq_dict(dict) dictionary with unmapped read names as keys,
178+
unmapped/mapped (u|m), seq, and quality as values in a list
168179
- fname(string) Path to output fastq file
169180
- mode(string) strip (remove read) or replace (replace read sequence) by Ns
170181
"""
@@ -218,13 +229,14 @@ def write_fq(fq_dict, fname, mode):
218229
f.write(f"{i}\n")
219230

220231

232+
def check_strip_mode(mode):
233+
if mode.lower() not in ['replace', 'strip']:
234+
print(f"Mode must be {' or '.join(mode)}")
235+
236+
221237
if __name__ == "__main__":
222238
BAM, IN_FWD, IN_REV, OUT_FWD, OUT_REV, MODE, PROC = _get_args()
223239

224-
if IN_REV and not OUT_REV:
225-
print('You specified an input reverse fastq, but no output reverse fastq')
226-
sys.exit(1)
227-
228240
if OUT_FWD == None:
229241
out_fwd = f"{IN_FWD.split('/')[-1].split('.')[0]}.r1.fq.gz"
230242
else:

conf/base.config

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,10 @@ process {
6161
withName: multiqc {
6262
errorStrategy = { task.exitStatus in [143,137] ? 'retry' : 'ignore' }
6363
}
64-
6564
withName: damageprofiler {
66-
errorStrategy = 'ignore'
67-
params.large_ref ? "time = { check_max(8.h * task.attempt, 'time') }" : "time = { check_max(2.h * task.attempt, 'time') }"
65+
time = params.large_ref ? { check_max(8.h * task.attempt, 'time') } : { check_max(2.h * task.attempt, 'time')}
6866
}
69-
70-
withName: extract_unmapped_reads {
67+
withName: strip_input_fastq {
7168
cpus = { check_max(8 * task.attempt, 'cpus') }
7269
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
7370
}

conf/test_fna.config

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/*
2+
* -------------------------------------------------
3+
* Nextflow config file for running tests
4+
* -------------------------------------------------
5+
* Defines bundled input files and everything required
6+
* to run a fast and simple test. Use as follows:
7+
* nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
8+
*/
9+
10+
params {
11+
config_profile_name = 'Test profile'
12+
config_profile_description = 'Minimal test dataset to check pipeline function'
13+
// Limit resources so that this can run on Travis
14+
max_cpus = 2
15+
max_memory = 6.GB
16+
max_time = 48.h
17+
genome = "Custom"
18+
//Input data
19+
singleEnd = false
20+
readPaths = [['JK2782_TGGCCGATCAACGA_L008', ['https://github.com/nf-core/test-datasets/raw/eager2/testdata/Mammoth/JK2782_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz','https://github.com/nf-core/test-datasets/raw/eager2/testdata/Mammoth/JK2782_TGGCCGATCAACGA_L008_R2_001.fastq.gz.tengrand.fq.gz']],
21+
['JK2785_TGGCCGATCAACGA_L008', ['https://github.com/nf-core/test-datasets/raw/eager2/testdata/Mammoth/JK2785_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz','https://github.com/nf-core/test-datasets/raw/eager2/testdata/Mammoth/JK2785_TGGCCGATCAACGA_L008_R2_001.fastq.gz.tengrand.fq.gz']],
22+
]
23+
// Genome references
24+
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager2/reference/Mammoth_MT_Krause.fna'
25+
}

docs/usage.md

Lines changed: 47 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<!-- Install Atom plugin markdown-toc-auto for this ToC to auto-update on save -->
66
<!-- TOC START min:2 max:3 link:true asterisk:true update:true -->
77
* [Table of contents](#table-of-contents)
8-
* [Introduction](#introduction)
8+
* [Introduction](#general-nextflow-info)
99
* [Running the pipeline](#running-the-pipeline)
1010
* [Updating the pipeline](#updating-the-pipeline)
1111
* [Reproducibility](#reproducibility)
@@ -168,12 +168,14 @@ A normal glob pattern, enclosed in quotation marks, can then be used for `--read
168168
```
169169

170170
### `--fasta`
171-
If you prefer, you can specify the full path to your reference genome when you run the pipeline:
171+
You specify the full path to your reference genome here. The FASTA file can have any file suffix, such as `.fasta`, `.fna`, `.fa`, `.FastA` etc. You may also supply a gzipped reference files, which will be unzipped automatically for you.
172+
173+
For example:
172174

173175
```bash
174-
--fasta '[path to Fasta reference]'
176+
--fasta '/<path>/<to>/my_reference.fasta'
175177
```
176-
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note, that saving these for later has to be turned on using `--saveReference`. You may also specify the path to a gzipped (`*.gz` file extension) FastA as reference genome - this will be uncompressed by the pipeline automatically for you. Note that other file extensions such as `.fna`, `.fa` are also supported but will be renamed to `.fasta` automatically by the pipeline.
178+
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters (see [below](#optional-reference-options)), the pipeline will create these indices for you automatically. Note that you can save the indices created for you for later by giving the `--saveReference` flag.
177179
178180
### `--large_ref`
179181

@@ -214,23 +216,55 @@ params {
214216
}
215217
```
216218

217-
### Optional Reference Utility Files
219+
## Optional Reference Options
220+
221+
### Generating Fresh Indices
222+
223+
#### `--saveReference`
224+
225+
Use this if you do not have pre-made reference FASTA indices for `bwa`, `samtools` and `picard`. If you turn this on, the indices EAGER2 generates for you will be stored in the `<your_output_dir>/results/reference_genomes` for you.
226+
227+
### Premade Indices
228+
229+
Supplying pre-made indices saves time in pipeline execution and is especially advised when running multiple times on the same cluster system for example. You can even add a resource specific profile that sets paths to pre-computed reference genomes, saving even time when specifying these.
230+
231+
#### `--bwa_index`
232+
233+
If you want to use pre-existing `bwa index` indices, please supply the path **and file** to the FASTA you also specified in `--fasta` (see above). EAGER2 will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.
234+
235+
For example:
236+
237+
```
238+
nextflow run nf-core/eager \
239+
-profile test_fna,docker \
240+
--pairedEnd \
241+
--reads *{R1,R2}*.fq.gz
242+
--fasta results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta \
243+
--bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
244+
```
245+
246+
> `bwa index` does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command _must not_ be changed, otherwise EAGER2 will not be able to find them.
218247
219-
### `--bwa_index`
248+
#### `--seq_dict`
220249

221-
Use this to specify a _directory_ containing previously created BWA index files. This saves time in pipeline execution and is especially advised when running multiple times on the same cluster system for example. You can even add a resource specific profile that sets paths to pre-computed reference genomes, saving even time when specifying these.
250+
If you want to use a pre-existing `picard CreateSequenceDictionary` dictionary file, use this to specify the required `.dict` file for the selected reference genome.
222251

223-
### `--seq_dict` false
252+
For example:
224253

225-
Use this to specify the required sequence dictionary file for the selected reference genome.
254+
```
255+
--seq_dict Mammoth_MT_Krause.dict
256+
```
257+
258+
#### `--fasta_index`
226259

227-
### `--fasta_index` false
260+
If you want to use a pre-existing `samtools faidx` index, Use this to specify the required FASTA index file for the selected reference genome. This should be generated by `samtools faidx` and has a file suffix of `.fai`
228261

229-
Use this to specify the required FastA index file for the selected reference genome.
262+
For example:
230263

231-
### `--saveReference` false
264+
```
265+
--fasta_index Mammoth_MT_Krause.fasta.fai
266+
```
232267

233-
If you turn this on, the generated indices will be stored in the `./results/reference_genomes` for you.
234268

235269
## Other command line parameters
236270

environment.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ dependencies:
1717
- bioconda::gatk4=4.1.1.0
1818
- bioconda::qualimap=2.2.2b
1919
- bioconda::vcf2genome=0.91
20-
- bioconda::damageprofiler=0.4.5
20+
- bioconda::damageprofiler=0.4.6
2121
- bioconda::multiqc=1.7
2222
- bioconda::pmdtools=0.60
2323
- conda-forge::r-rmarkdown=1.12
@@ -29,5 +29,5 @@ dependencies:
2929
- bioconda::bamutil=1.0.14
3030
- bioconda::mtnucratio=0.5
3131
- pysam=0.15.2
32-
- python=3.6
32+
- python=3.6.3
3333
#Missing Schmutzi,snpAD

0 commit comments

Comments
 (0)