Skip to content

Commit 3cf7872

Browse files
authored
Merge pull request #396 from nf-core/tsv-input
Add TSV input funcionality and extra goodies.
2 parents 1600d8b + 8aec6d3 commit 3cf7872

71 files changed

Lines changed: 29647 additions & 3435 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/awstest.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: nf-core AWS test
2+
# This workflow is triggered on push to the master branch and on published releases.
3+
# It runs the -profile 'test' on AWS batch
4+
5+
on:
6+
push:
7+
branches:
8+
- master
9+
release:
10+
types: [published]
11+
12+
jobs:
13+
run-awstest:
14+
name: Run AWS test
15+
runs-on: ubuntu-latest
16+
steps:
17+
- name: Setup Miniconda
18+
uses: goanpeca/setup-miniconda@v1.0.2
19+
with:
20+
auto-update-conda: true
21+
python-version: 3.7
22+
- name: Install awscli
23+
run: conda install -c conda-forge awscli
24+
- name: Start AWS batch job
25+
env:
26+
AWS_ACCESS_KEY_ID: ${{secrets.AWS_KEY_ID}}
27+
AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_KEY_SECRET}}
28+
TOWER_ACCESS_TOKEN: ${{secrets.TOWER_ACCESS_TOKEN}}
29+
run: |
30+
aws batch submit-job --region eu-west-1 --job-name nf-core-eager --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow --container-overrides '{"command": ["nf-core/eager", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://nf-core-awsmegatests/eager/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/eager/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'

.github/workflows/ci.yml

Lines changed: 49 additions & 32 deletions
Large diffs are not rendered by default.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ results/
66
tests/
77
testing/
88
*.pyc
9+
main_playground.nf

CHANGELOG.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,35 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99

1010
### `Added`
1111

12+
* **Major** Automated Cloud Tests with large-scale data on [AWS](https://aws.amazon.com/)
13+
* **Major** Re-wrote input logic to accept a TSV 'map' file in addition to direct paths to FASTQ
14+
* **Major** Lane and library merging implement
15+
* When using TSV input, one libraries with the multiple _lane_ will be merged together, before mapping
16+
* Strip FASTQ will also produce a lane merged 'raw' but 'stripped' FASTQ file
17+
* When using TSV input, one sample with multiple (same treatment) libraries will be merged together.
18+
* Important: direct FASTQ paths will not have this functionality. TSV is required.
1219
* Added sanity check and clearer error message when `--fasta_index` is provided and filepath does not end in `.fai`.
20+
* Added basic json_schema
21+
* Improved error messages
22+
* Added ability for automated emails using `mailutils` to also send MultiQC reports
23+
* General documentation additions and cleaning, updated figures with CC-BY license
1324

1425
### `Fixed`
1526

1627
* [#368](https://github.com/nf-core/eager/issues/368) - Fixed the profile `test` to contain a parameter for `--paired_end`.
1728
* Mini bugfix for typo in line 1260+1261
18-
* Added basic json_schema
29+
* [#374](https://github.com/nf-core/eager/issues/374) - Fixed output documentation rendering not containing images
1930
* [#379](https://github.com/nf-core/eager/issues/378) - Fixed insufficient memory requirements for FASTQC edge case
20-
* Set correct recommended bwa mapping parameters from [Schubert et al. 2012](https://doi.org/10.1186/1471-2164-13-178)
31+
* [#390](https://github.com/nf-core/eager/issues/390) - Renamed clipped/merged output directory to be more descriptive
32+
* [#398](https://github.com/nf-core/eager/issues/498) - Stopped incompatible FASTA indexes being accepted
33+
* [#400](https://github.com/nf-core/eager/issues/400) - Set correct recommended bwa mapping parameters from [Schubert et al. 2012](https://doi.org/10.1186/1471-2164-13-178)
34+
* [#410](https://github.com/nf-core/eager/issues/410) - Fixed nf-core/configs not being loaded properly
35+
* Increase MultiQC process memory requirements to ensure enough memory for large runs
2136

2237
### `Dependencies`
2338

2439
* Latest version of DeDup (0.12.6) which now reports mapped reads after deduplication
25-
* Latest version of ANGSD (0.933) which doesn't seg fault when running contamination on BAMs with insufficent reads
40+
* Latest version of ANGSD (0.933) which doesn't seg fault when running contamination on BAMs with insufficient reads
2641

2742
## [2.1.0] - 2020-03-05 - "Ravensburg"
2843

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Additional functionality contained by the pipeline currently includes:
8585

8686
5. Start running your own ancient DNA analysis!
8787

88-
nextflow run nf-core/eager -profile <docker/singularity/conda> --reads '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
88+
nextflow run nf-core/eager -profile <docker/singularity/conda> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
8989

9090
6. Once your run has completed successfully, clean up the intermediate files.
9191

@@ -136,11 +136,12 @@ Those who have provided conceptual guidance, suggestions, bug reports etc.
136136
* [Irina Velsko](https://github.com/ivelsko)
137137
* [Katerine Eaton](https://github.com/ktmeaton)
138138
* [Luc Venturini](https://github.com/lucventurini)
139-
* Marcel Keller
139+
* [Marcel Keller](https://github.com/marcel-keller)
140140
* [Pierre Lindenbaum](https://github.com/lindenb)
141141
* [Pontus Skoglund](https://github.com/pontussk)
142142
* [Raphael Eisenhofer](https://github.com/EisenRa)
143143
* [Torsten Günter](https://bitbucket.org/tguenther/)
144+
* [Kevin Lord](https://github.com/lordkev)
144145

145146
If you've contributed and you're missing in here, please let us know and we will add you in of course!
146147

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

assets/multiqc_config.yaml

Lines changed: 41 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,36 @@
1+
custom_logo: 'nf-core_eager_logo.png'
12
custom_logo_url: https://github.com/nf-core/eager/
23
custom_logo_title: 'nf-core/eager'
34

45
report_comment: >
56
This report has been generated by the <a href="https://github.com/nf-core/eager" target="_blank">nf-core/eager</a>
67
analysis pipeline. For information about how to interpret these results, please see the
78
<a href="https://github.com/nf-core/eager" target="_blank">documentation</a>.
9+
extra_fn_clean_exts:
10+
- '.fastq.gz'
11+
- '.fastq'
12+
- '.fq.gz'
13+
- '.fq'
14+
- '.bam'
15+
- '_fastp'
16+
- '.pe.settings'
17+
- '.se.settings'
18+
- '.settings'
19+
- '.pe.combined'
20+
- '.se.truncated'
21+
- '.mapped'
22+
- '.mapped_rmdup'
23+
- '.mapped_rmdup_stats'
24+
- '_libmerged_rg_rmdup'
25+
- '_libmerged_rg_rmdup_stats'
26+
- '_postfilterflagstat.stats'
27+
- '_flagstat.stat'
28+
- '.filtered'
29+
- '.filtered_rmdup'
30+
- '.filtered_rmdup_stats'
31+
- '_libmerged_rg_add_stats'
32+
- '_rmdup'
33+
834
top_modules:
935
- 'fastqc':
1036
name: 'FastQC (pre-AdapterRemoval)'
@@ -62,20 +88,21 @@ table_columns_visible:
6288
mapped_passed: True
6389
DeDup:
6490
dup_rate: False
65-
QualiMap:
66-
mean_coverage: True
67-
1_x_pc: True
68-
5_x_pc: True
69-
percentage_aligned: False
70-
mtnucratio:
71-
mt_nuc_ratio: True
7291
DamageProfiler:
7392
3 Prime1: True
7493
3 Prime2: True
7594
5 Prime1: True
7695
5 Prime2: True
7796
mean_readlength: True
7897
median_readlength: True
98+
QualiMap:
99+
mean_coverage: True
100+
1_x_pc: True
101+
5_x_pc: True
102+
percentage_aligned: False
103+
mtnucratio:
104+
mt_nuc_ratio: True
105+
79106
table_columns_placement:
80107
FastQC (pre-AdapterRemoval):
81108
total_sequences: 100
@@ -96,6 +123,13 @@ table_columns_placement:
96123
endogenous_dna_post: 590
97124
DeDup:
98125
clusterfactor: 600
126+
DamageProfiler:
127+
3 Prime1: 1000
128+
3 Prime2: 1010
129+
5 Prime1: 1020
130+
5 Prime2: 1030
131+
mean_readlength: 1040
132+
median_readlength: 1050
99133
QualiMap:
100134
mean_coverage: 700
101135
median_coverage: 710
@@ -112,13 +146,6 @@ table_columns_placement:
112146
sexdeterrmine:
113147
RateX: 900
114148
RateY: 910
115-
DamageProfiler:
116-
3 Prime1: 1000
117-
3 Prime2: 1010
118-
5 Prime1: 1020
119-
5 Prime2: 1030
120-
mean_readlength: 1040
121-
median_readlength: 1050
122149
read_count_multiplier: 1
123150
read_count_prefix: ''
124151
read_count_desc: ''

conf/test_tsv.config

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/*
2+
* -------------------------------------------------
3+
* Nextflow config file for running tests
4+
* -------------------------------------------------
5+
* Defines bundled input files and everything required
6+
* to run a fast and simple test. Use as follows:
7+
* nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
8+
*/
9+
10+
params {
11+
config_profile_name = 'Test profile'
12+
config_profile_description = 'Minimal test dataset to check pipeline function'
13+
// Limit resources so that this can run on GitHub Actions
14+
max_cpus = 2
15+
max_memory = 6.GB
16+
max_time = 48.h
17+
genome = false
18+
//Input data
19+
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv'
20+
// Genome references
21+
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'
22+
}

conf/test_tsv_bam.config

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/*
2+
* -------------------------------------------------
3+
* Nextflow config file for running tests
4+
* -------------------------------------------------
5+
* Defines bundled input files and everything required
6+
* to run a fast and simple test. Use as follows:
7+
* nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
8+
*/
9+
10+
params {
11+
config_profile_name = 'Test profile'
12+
config_profile_description = 'Minimal test dataset to check pipeline function'
13+
// Limit resources so that this can run on Travis
14+
max_cpus = 2
15+
max_memory = 6.GB
16+
max_time = 48.h
17+
genome = false
18+
//Input data
19+
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_bam.tsv'
20+
// Genome references
21+
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'
22+
}

conf/test_tsv_complex.config

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/*
2+
* -------------------------------------------------
3+
* Nextflow config file for running tests
4+
* -------------------------------------------------
5+
* Defines bundled input files and everything required
6+
* to run a fast and simple test. Use as follows:
7+
* nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
8+
*/
9+
10+
params {
11+
config_profile_name = 'Test profile'
12+
config_profile_description = 'Minimal test dataset to check pipeline function'
13+
// Limit resources so that this can run on GitHub Actions
14+
max_cpus = 2
15+
max_memory = 6.GB
16+
max_time = 48.h
17+
genome = false
18+
//Input data
19+
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq_multilane_multilib.tsv'
20+
// Genome references
21+
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'
22+
}

0 commit comments

Comments
 (0)