Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
5dec901
Adjusted fixes for AWS index handling
apeltzer Jul 12, 2020
a19e32b
Fix supplied indices and typo
apeltzer Jul 12, 2020
6b6d9ca
Fix test
apeltzer Jul 12, 2020
65048fc
Zipped fasta it is
apeltzer Jul 12, 2020
8be5066
Fix some more indices
apeltzer Jul 12, 2020
fd43ba8
Fix path
apeltzer Jul 12, 2020
dea0244
Fix strip_input
apeltzer Jul 12, 2020
ee4721f
Fix some more paths
apeltzer Jul 12, 2020
96fff7a
Fix small malt issue
apeltzer Jul 13, 2020
95ccd01
Merge branch 'dev' into fix-aws
apeltzer Jul 13, 2020
e1a5782
Atttempt to fix last tmissing bits for index handlign
apeltzer Jul 13, 2020
3a9d847
Merge branch 'fix-aws' of https://github.com/nf-core/eager into fix-aws
apeltzer Jul 13, 2020
e09082c
Some more fixes
apeltzer Jul 13, 2020
1e5c8a4
Prefix with def to fix
apeltzer Jul 13, 2020
fbb5309
Make new dummy file replace Class() with instanceof and remove dummy NAs
jfy133 Jul 14, 2020
c20eb50
One more
jfy133 Jul 14, 2020
d13a078
Stop file collisions
jfy133 Jul 14, 2020
1e4035b
Remove debugging code
jfy133 Jul 14, 2020
1096321
Remove last code TODO after local testing
jfy133 Jul 14, 2020
bbb210f
Merge branch 'fix-aws' into aws-syncing
jfy133 Jul 16, 2020
f109a80
Merge pull request #514 from nf-core/aws-syncing
jfy133 Jul 16, 2020
fe4e8a3
Make bwa input as path
jfy133 Jul 16, 2020
a477b0d
Remove duplicate strip_fastq process
jfy133 Jul 16, 2020
ae33488
Fix inaccesible BAMs due to grouping of single elements when skipping…
jfy133 Jul 16, 2020
0330c6b
Remove debugging stuff
jfy133 Jul 16, 2020
827d585
Add some more dumps
apeltzer Jul 16, 2020
cd2ecb5
Add benchmark_vikingfish_single
apeltzer Jul 16, 2020
bc59f45
Replace jfy133 with nf-core
apeltzer Jul 16, 2020
2477526
Dummy file added
apeltzer Jul 16, 2020
47993ef
Try ensuring all read inputs into mapping are single elements
jfy133 Jul 21, 2020
139b357
Merge branch 'fix-aws' of github.com:nf-core/eager into fix-aws
jfy133 Jul 21, 2020
a474e6b
Fix broken tag for MarkDuplicates
jfy133 Jul 27, 2020
82a15f6
Clean markdupped BAM name for MultiQC and hide occasional qualimap co…
jfy133 Jul 27, 2020
f6393fa
Fix circularmapper and remove some leftover dumps
jfy133 Jul 27, 2020
8a0bce0
Tweaked general stats (and output.md) to make more logical flow of ma…
jfy133 Jul 27, 2020
08d45e8
Start fixing no-preseq when markdup
jfy133 Jul 28, 2020
bb1465b
Fixed and documented preseq not running when using markdups
jfy133 Jul 28, 2020
baaa430
Fix GATK UG always publishing BAIs even if not requested
jfy133 Jul 28, 2020
e8635bc
Fix occasional pre BAMtrimming library_merge failure
jfy133 Jul 28, 2020
59d710f
Fix PMD param in validations
jfy133 Jul 28, 2020
a43f4a4
Merge remote-tracking branch 'upstream/dev' into fix-aws
apeltzer Jul 28, 2020
2fe4c53
Fix ci
apeltzer Jul 28, 2020
5c49e6f
Fix supplying fasta Index
apeltzer Jul 28, 2020
424651e
Fix CI
apeltzer Jul 28, 2020
6ebd4b6
Remove failed arrayfy hack
jfy133 Jul 28, 2020
f2cf2a5
Merge branch 'fix-aws' of github.com:nf-core/eager into fix-aws
jfy133 Jul 28, 2020
1d7d53e
Fix bad R2 when not existing in lanemerge
jfy133 Jul 28, 2020
1d66da7
Fix downstream poop-gen tools NO_FILE
jfy133 Jul 29, 2020
5a9cb6c
Fix strip merged FASTQ
jfy133 Jul 29, 2020
0b39ce6
Fix pileupcaller
jfy133 Jul 29, 2020
48783d9
Merge branch 'fix-aws' into minor-fixes
jfy133 Aug 3, 2020
d648742
Merge pull request #534 from nf-core/minor-fixes
jfy133 Aug 3, 2020
8225a8c
Clarify name when non-tsv and fix missing skipme merging after lane …
jfy133 Aug 3, 2020
247f43e
Remove leftover view and run strip_Fastq test on more appropriate tSV
jfy133 Aug 3, 2020
c0fe4dc
Send NucContam JSON to MQC
jfy133 Aug 3, 2020
e2a8b84
Fix linting
jfy133 Aug 3, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --save_reference
- name: REFERENCE Basic workflow, with supplied indices
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --bwa_index 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta' --fasta_index 'https://github.com/nf-core/test-datasets/blob/eager/reference/Mammoth/Mammoth_MT_Krause.fasta.fai'
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --bwa_index 'results/reference_genome/bwa_index/BWAIndex/' --fasta_index 'https://github.com/nf-core/test-datasets/blob/eager/reference/Mammoth/Mammoth_MT_Krause.fasta.fai'
- name: REFERENCE Run the basic pipeline with FastA reference with `fna` extension
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker
Expand Down Expand Up @@ -103,7 +103,7 @@ jobs:
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --mapper 'bowtie2' --bt2_alignmode 'local' --bt2_sensitivity 'sensitive' --bt2n 1 --bt2l 16 --bt2_trim5 1 --bt2_trim3 1
- name: STRIP_FASTQ Run the basic pipeline with output unmapped reads as fastq
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --strip_input_fastq
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --strip_input_fastq
- name: BAM_FILTERING Run basic mapping pipeline with mapping quality filtering, and unmapped export
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_bam_filtering --bam_mapping_quality_threshold 37 --bam_unmapped_type 'fastq'
Expand Down
1 change: 0 additions & 1 deletion assets/dummy.txt

This file was deleted.

9 changes: 5 additions & 4 deletions assets/multiqc_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ table_columns_visible:
1_x_pc: True
5_x_pc: True
percentage_aligned: False
median_insert_size: False
MultiVCFAnalyzer:
Heterozygous SNP alleles (percent): True
endorSpy:
Expand Down Expand Up @@ -204,11 +205,11 @@ table_columns_placement:
flagstat_total: 551
mapped_passed: 552
Samtools Flagstat (post-samtools filter):
flagstat_total: 553
mapped_passed: 554
flagstat_total: 600
mapped_passed: 620
endorSpy:
endogenous_dna: 600
endogenous_dna_post: 610
endogenous_dna: 610
endogenous_dna_post: 640
nuclear_contamination:
Num_SNPs: 1100
Method1_MOM_estimate: 1110
Expand Down
1 change: 1 addition & 0 deletions assets/nf-core_eager_dummy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actually only use one.
1 change: 1 addition & 0 deletions assets/nf-core_eager_dummy2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a second dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actually only use one.
2 changes: 1 addition & 1 deletion conf/benchmarking_vikingfish.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ params {
config_profile_description = "A 'fullsized' benchmarking profile for deepish sequencing aDNA data"

//Input data
input = 'https://raw.githubusercontent.com/jfy133/test-datasets/eager/testdata/Benchmarking/benchmarking_vikingfish.tsv'
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_vikingfish.tsv'
// Genome reference
fasta = 'https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/Gadus_morhua/representative/GCF_902167405.1_gadMor3.0/GCF_902167405.1_gadMor3.0_genomic.fna.gz'

Expand Down
56 changes: 56 additions & 0 deletions conf/benchmarking_vikingfish_single.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
*/

params {
config_profile_name = 'nf-core/eager benchmarking - Viking Fish profile'
config_profile_description = "A 'fullsized' benchmarking profile for deepish sequencing aDNA data"

//Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_vikingfish_single.tsv'
// Genome reference
fasta = 'https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/Gadus_morhua/representative/GCF_902167405.1_gadMor3.0/GCF_902167405.1_gadMor3.0_genomic.fna.gz'

bwaalnn = 0.04
bwaalnl = 1024

run_bam_filtering = true
bam_discard_unmapped = true
bam_unmapped_type = 'discard'
bam_mapping_quality_threshold = 25

run_genotyping = true
genotyping_tool = 'hc'
genotyping_source = 'raw'
gatk_ploidy = 2

}

process {
withName:'adapter_removal'{
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
time = { check_max( 2.h * task.attempt, 'time' ) }
}
withName:'bwa'{
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
}
withName:'dedup'{
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
}
withName:'genotyping_hc'{
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
}

}
5 changes: 3 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,16 @@ The possible columns displayed by default are as follows:
- **Mappability** This is from MALT. It reports the percentage of the off-target reads (from mapping), that could map to your MALT metagenomic database. This can often be low for aDNA due to short reads and database bias.
- **% Unclassified** This is from Kraken. It reports the percentage of reads that could not be aligned and taxonomically assigned against your Kraken metagenomic database. This can often be high for aDNA due to short reads and database bias.
- **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _prior_ map quality filtering and deduplication.
- **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _after_ map quality filtering and deduplication (note the column name does not distinguish itself from prior-map quality filtering, but the post-filter column is always second)
- **Endogenous DNA (%)** This is from the endorS.py tool. It displays a percentage of mapped reads over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). Assuming a perfect ancient sample with no modern contamination, this would be the amount of true ancient DNA in the sample. However this value _most likely_ include contamination and will not entirely be the true 'endogenous' content.
- **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _after_ map quality filtering and deduplication (note the column name does not distinguish itself from prior-map quality filtering, but the post-filter column is always second)
- **Endogenous DNA Post (%)** This is from the endorS.py tool. It displays a percentage of mapped reads _after_ BAM filtering (e.g. for mapping quality) over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). This column will only be displayed if BAM filtering is turned on and is based on the original mapping for total reads, and mapped reads as calculated from the post-filtering BAM.
- **ClusterFactor** This is from DeDup. This is a value representing the how many duplicates in the library exist for each unique read. A cluster factor close to one replicates a highly complex library and could be sequenced further. Generally with a value of more than 2 you will not be gaining much more information by sequencing deeper.
- **Dups** This is from Picard's markDuplicates. It represents the percentage of reads in your library that were exact duplicates of other reads in your database. The lower the better, as high duplication rate means lots of sequencing of the same information (and therefore is not time or cost effective).
- **X Prime Y>Z N base** These columns are from DamageProfiler. The prime numbers represent which end of the reads the damage is referring to. The Y>Z is the type of substitution (C>T is the true damage, G>A is the complementary). You should see for no- and half- UDG treatment a decrease in frequency from the 1st to 2nd base.
- **Mean Read Length** This is from DamageProfiler. This is the mean length of all de-duplicated mapped reads. Ancient DNA normally will have a mean between 30-75, however this can vary.
- **Median Read Length** This is from DamageProfiler. This is the median length of all de-duplicated mapped reads. Ancient DNA normally will have a mean between 30-75, however this can vary.
- **Coverage** This is from Qualimap. This is the median number of times a base on your reference genome was covered by a read (i.e. depth coverage).. This average includes bases with 0 reads covering that position.
- **Algined** This is from Qualimap. This is the total number of _deduplicated_ reads that mapped to your reference genome.
- **Mean/Median Coverage** This is from Qualimap. This is the mean/median number of times a base on your reference genome was covered by a read (i.e. depth coverage). This average includes bases with 0 reads covering that position.
- **>= 1X** to **>= 5X** These are from Qualimap. This is the percentage of the genome covered at that particular depth coverage.
- **% GC** This is the mean GC content in percent of all mapped reads post-deduplication. This should normally be close to the GC content of your reference genome.
- **MT to Nuclear Ratio** This from MTtoNucRatio. This reports the number of reads aligned to a mitochondrial entry in your reference FASTA to all other entries. This will typically be high but will vary depending on tissue type.
Expand Down
7 changes: 5 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ If you have multiple files in different directories, you can use additional wild
4. When using the pipeline with **paired end data**, the path must use `{1,2}` notation to specify read pairs.
5. Files names must be unique, having files with the same name, but in different directories is _not_ sufficient
- This can happen when a library has been sequenced across two sequencers on the same lane. Either rename the file, try a symlink with a unique name, or merge the two FASTQ files prior input.
6. Due to limitations of downstream tools (e.g. FastQC), sample IDs maybe truncated after the first `.` in the name, Ensure file names are unique prior to this!

##### TSV Input Method

Expand Down Expand Up @@ -362,7 +363,7 @@ Use this if you do not have pre-made reference FASTA indices for `bwa`, `samtool

#### `--bwa_index`

If you want to use pre-existing `bwa index` indices, please supply the path **and file** to the FASTA you also specified in `--fasta` (see above). EAGER2 will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.
If you want to use pre-existing `bwa index` indices, please supply the **directory** to the FASTA you also specified in `--fasta` (see above). EAGER2 will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you want to use pre-existing `bwa index` indices, please supply the **directory** to the FASTA you also specified in `--fasta` (see above). EAGER2 will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.
If you want to use pre-existing `bwa index` indices, please supply the **directory** to the FASTA you also specified in `--fasta` (see above). nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.


For example:

Expand All @@ -371,7 +372,7 @@ nextflow run nf-core/eager \
-profile test,docker \
--input '*{R1,R2}*.fq.gz'
--fasta 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta' \
--bwa_index 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta'
--bwa_index 'results/reference_genome/bwa_index/BWAIndex/'
```

> `bwa index` does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command _must not_ be changed, otherwise EAGER2 will not be able to find them.
Expand Down Expand Up @@ -734,6 +735,8 @@ Sets DeDup to treat all reads as merged reads. This is useful if reads are for e

### Library Complexity Estimation Parameters

nf-core/eager uses Preseq on map reads as one method to calculate library complexity. If DeDup is used, Preseq uses the historigram output of DeDup, otherwise the sored non-duplicated BAM file is supplied. Furthermore, if paired-end read collapsing is not performed, the `-P` flag is used.

#### `--preseq_step_size`

Can be used to configure the step size of Preseqs `c_curve` method. Can be useful when only few and thus shallow sequencing results are used for extrapolation.
Expand Down
Loading