Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
ffabee7
Get base requirements set up
apeltzer May 9, 2019
406f18c
Merge remote-tracking branch 'upstream/dev' into dev
apeltzer Sep 6, 2019
89116c7
Merge remote-tracking branch 'upstream/dev' into dev
apeltzer Oct 21, 2019
065ae50
Start adding mouse data
apeltzer Oct 21, 2019
77ec187
Update iGenomes.config
apeltzer Oct 21, 2019
db69d29
Add tbi
apeltzer Oct 21, 2019
8995981
Drop ASCAT files
apeltzer Oct 21, 2019
0d1e56c
Use Version 98 of Mouse
apeltzer Oct 28, 2019
9195f0f
Add for grcm38
apeltzer Oct 28, 2019
228592d
Adjust mus musculus DB
apeltzer Oct 28, 2019
e216b24
Annotation
apeltzer Oct 28, 2019
0dacda1
Adjusted genomes.config
apeltzer Oct 30, 2019
9a35195
Should be list
apeltzer Oct 30, 2019
e6c07a5
Set genomes_base to something
apeltzer Oct 30, 2019
934622d
Revert back
apeltzer Oct 30, 2019
985af84
Add proper calling list
apeltzer Oct 30, 2019
da5ea62
Use the bed file
apeltzer Oct 30, 2019
888a220
Fix genome fa.fai
apeltzer Oct 30, 2019
17e1f23
Add in mgpv5
apeltzer Oct 30, 2019
c3c6e93
Try short track
apeltzer Oct 31, 2019
7fa9756
Add in species handling
apeltzer Oct 31, 2019
2f9530f
Document new parameter species
apeltzer Oct 31, 2019
30e6b4c
Add changelog
apeltzer Oct 31, 2019
c86114a
Fix iGenomes stuff
apeltzer Oct 31, 2019
9ba7c6f
Add in note about GRCm38
apeltzer Oct 31, 2019
624bab1
Merge branch 'dev' into dev
maxulysse Oct 31, 2019
b33576b
Fix small fai index issue
apeltzer Oct 31, 2019
af66811
Merge branch 'dev' of https://github.com/apeltzer/sarek into dev
apeltzer Oct 31, 2019
c70dd71
Adjusted quotes in genomes.config
apeltzer Oct 31, 2019
fffc46f
And the same for igenomes
apeltzer Oct 31, 2019
2c821e4
Better folder structure for Mouse Genome Project data
apeltzer Oct 31, 2019
0925189
Minor adjustment to propoer paths
apeltzer Oct 31, 2019
12a5152
Apply suggestions from code review
apeltzer Oct 31, 2019
92be499
Remove space
apeltzer Oct 31, 2019
98ea898
Move it up
apeltzer Oct 31, 2019
bf3f262
Update CHANGELOG.md
apeltzer Oct 31, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
### `Added`

- [#46](https://github.com/nf-core/sarek/pull/46) - Add location to abstacts
- [#52](https://github.com/nf-core/sarek/pull/52) - Add support for mouse data `GRCm38`

### `Changed`

Expand Down
31 changes: 25 additions & 6 deletions conf/genomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ params {
intervals = "${params.genomes_base}/wgs_calling_regions_Sarek.list"
knownIndels = "${params.genomes_base}/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
knownIndelsIndex = "${params.genomes_base}/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
snpeffDb = "GRCh37.75"
vepCacheVersion = "95"
snpeffDb = 'GRCh37.75'
vepCacheVersion = '95'
species = 'homo_sapiens'
Comment thread
apeltzer marked this conversation as resolved.
}
'GRCh38' {
acLoci = "${params.genomes_base}/1000G_phase3_GRCh38_maf0.3.loci"
Expand All @@ -44,8 +45,9 @@ params {
intervals = "${params.genomes_base}/wgs_calling_regions.hg38.bed"
knownIndels = "${params.genomes_base}/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
knownIndelsIndex = "${params.genomes_base}/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
snpeffDb = "GRCh38.86"
vepCacheVersion = "95"
snpeffDb = 'GRCh38.86'
vepCacheVersion = '95'
species = 'homo_sapiens'
}
'smallGRCh37' {
acLoci = "${params.genomes_base}/1000G_phase3_20130502_SNP_maf0.3.small.loci"
Expand All @@ -55,8 +57,25 @@ params {
germlineResource = "${params.genomes_base}/dbsnp_138.b37.small.vcf.gz"
intervals = "${params.genomes_base}/small.intervals"
knownIndels = ["${params.genomes_base}/1000G_phase1.indels.b37.small.vcf.gz", "${params.genomes_base}/Mills_and_1000G_gold_standard.indels.b37.small.vcf.gz"]
snpeffDb = "GRCh37.75"
vepCacheVersion = "95"
snpeffDb = 'GRCh37.75'
vepCacheVersion = '95'
species = 'homo_sapiens'
}
'GRCm38' {
bwaIndex = "${params.genomes_base}/genome.fa.{amb,ann,bwt,pac,sa}"
chrDir = "${params.genomes_base}/Chromosomes"
chrLength = "${params.genomes_base}/GRCm38.len"
dbsnp = "${params.genomes_base}/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"
dbsnpIndex = "${params.genomes_base}/mgp.v5.merged.snps_all.dbSNP142.vcf.gz.tbi"
dict = "${params.genomes_base}/genome.dict"
fasta = "${params.genomes_base}/genome.fa"
fastaFai = "${params.genomes_base}/genome.fa.fai"
intervals = "${params.genomes_base}/GRCm38_calling_list.bed"
knownIndels = "${params.genomes_base}/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz"
knownIndelsIndex = "${params.genomes_base}/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi"
snpeffDb = 'GRCm38.86'
vepCacheVersion = '98'
species = 'mus_musculus'
}
}
}
26 changes: 22 additions & 4 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/intervals/wgs_calling_regions_Sarek.list"
knownIndels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
knownIndelsIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
snpeffDb = "GRCh37.75"
vepCacheVersion = "95"
snpeffDb = 'GRCh37.75'
vepCacheVersion = '95'
species = 'homo_sapiens'
}
'GRCh38' {
acLoci = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci"
Expand All @@ -44,8 +45,25 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/intervals/wgs_calling_regions.hg38.bed"
knownIndels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
knownIndelsIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
snpeffDb = "GRCh38.86"
vepCacheVersion = "95"
snpeffDb = 'GRCh38.86'
vepCacheVersion = '95'
species = 'homo_sapiens'
}
'GRCm38' {
bwaIndex = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa.{amb,ann,bwt,pac,sa}"
chrDir = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes"
chrLength = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Length/GRCm38.len"
dbsnp = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/Annotation/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you download these files from @apeltzer ? Looks like they are the latest but may be worth documenting that somewhere e.g.

dbSNP files for GRCm38 were downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/ on [date].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are quite difficult to find so worth being a bit more transparent whilst things are fresh!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a PR to AWS-iGenomes where I'm adding these already:

ewels/AWS-iGenomes#7

Didn't know exactly where this info should be dropped, but that felt like a logical thing to do as others not using Sarek might use the files too and can find the information about where I got these files from there then, too ?

The .fai files and .dict match - double checked that and also ran local test data (~50GB) with these change to check that everything is fine 👍

dbsnpIndex = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/Annotation/mgp.v5.merged.snps_all.dbSNP142.vcf.gz.tbi"
dict = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.dict"
fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa"
fastaFai = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa.fai"
intervals = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/intervals/GRCm38_calling_list.bed"
knownIndels = "${params.igenomes_base}/Mus_musculus/Annotation/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz"
knownIndelsIndex = "${params.igenomes_base}/Mus_musculus/Annotation/MouseGenomeProject/mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi"
snpeffDb = 'GRCm38.86'
vepCacheVersion = '98'
species = 'mus_musculus'
}
}
}
2 changes: 1 addition & 1 deletion docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Sarek is using [AWS iGenomes](https://ewels.github.io/AWS-iGenomes/), which facilitate storing and sharing references.
Sarek currently uses `GRCh38` by default.
Both `GRCh37` and `GRCh38` are available with `--genome GRCh37` or `--genome GRCh38` respectively with any profile using the `conf/igenomes.config` file, or you can specify it with `-c conf/igenomes.config`.
`GRCh37`, `GRCh38` and `GRCm38` are available with `--genome GRCh37`, `--genome GRCh38` or `--genome GRCm38` respectively with any profile using the `conf/igenomes.config` file, or you can specify it with `-c conf/igenomes.config`.
Use `--genome smallGRCh37` to map against a small reference genome based on GRCh37.
Settings in `igenomes.config` can be tailored to your needs.

Expand Down
5 changes: 5 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
* [`--snpeffDb`](#--snpeffdb)
* [`--vepCacheVersion`](#--vepcacheversion)
* [`--igenomesIgnore`](#--igenomesignore)
* [`--species`](#--species)
* [Job resources](#job-resources)
* [Automatic resubmission](#automatic-resubmission)
* [Custom resource requests](#custom-resource-requests)
Expand Down Expand Up @@ -514,6 +515,10 @@ If you prefer, you can specify the cache version when you run the pipeline:
Do not load `igenomes.config` when running the pipeline.
You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.

### `--species`

This specifies the species used for running VEP annotation. For human data, this needs to be set to `homo_sapiens`, for mouse data `mus_musculus` as the annotation needs to know where to look for appropriate annotation references. If you use iGenomes or a local resource with `genomes.conf`, this has already been set for you appropriately.

## Job resources

### Automatic resubmission
Expand Down
3 changes: 3 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2247,6 +2247,7 @@ process VEP {
script:
reducedVCF = reduceVCF(vcf.fileName)
genome = params.genome == 'smallGRCh37' ? 'GRCh37' : params.genome

dir_cache = (params.vep_cache && params.annotation_cache) ? " \${PWD}/${dataDir}" : "/.vep"
cadd = (params.cadd_cache && params.cadd_WG_SNVs && params.cadd_InDels) ? "--plugin CADD,whole_genome_SNVs.tsv.gz,InDels.tsv.gz" : ""
genesplicer = params.genesplicer ? "--plugin GeneSplicer,/opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/bin/genesplicer,/opt/conda/envs/nf-core-sarek-${workflow.manifest.version}/share/genesplicer-1.0-1/human,context=200,tmpdir=\$PWD/${reducedVCF}" : "--offline"
Expand All @@ -2257,6 +2258,7 @@ process VEP {
-i ${vcf} \
-o ${reducedVCF}_VEP.ann.vcf \
--assembly ${genome} \
--species ${params.species} \
${cadd} \
${genesplicer} \
--cache \
Expand Down Expand Up @@ -2318,6 +2320,7 @@ process VEPmerge {
-i ${vcf} \
-o ${reducedVCF}_VEP.ann.vcf \
--assembly ${genome} \
--species ${params.species} \
${cadd} \
${genesplicer} \
--cache \
Expand Down