Skip to content

Commit 9ac22d0

Browse files
authored
Merge pull request nf-core#639 from alneberg/dev
Beginners usage docs [skip ci]
2 parents 24673f2 + efe0c36 commit 9ac22d0

6 files changed

Lines changed: 359 additions & 210 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1212
- [#615](https://github.com/SciLifeLab/Sarek/pull/615) - Update documentation
1313
- [#620](https://github.com/SciLifeLab/Sarek/pull/620) - Add `tmp/` to `.gitignore`
1414
- [#625](https://github.com/SciLifeLab/Sarek/pull/625) - Add [`pathfindr`](https://github.com/NBISweden/pathfindr) as a submodule
15+
- [#639](https://github.com/SciLifeLab/Sarek/pull/639) - Add a complete example analysis to docs
1516

1617
### `Changed`
1718
- [#608](https://github.com/SciLifeLab/Sarek/pull/608) - Update Nextflow required version
@@ -24,6 +25,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
2425
- [#632](https://github.com/SciLifeLab/Sarek/pull/632) - Use 2 threads and 2 cpus FastQC processes
2526
- [#637](https://github.com/SciLifeLab/Sarek/pull/637) - Update tool version gathering
2627
- [#638](https://github.com/SciLifeLab/Sarek/pull/638) - Use correct `.simg` extension for Singularity images
28+
- [#639](https://github.com/SciLifeLab/Sarek/pull/639) - Smaller refactoring of the docs
2729

2830
### `Removed`
2931
- [#616](https://github.com/SciLifeLab/Sarek/pull/616) - Remove old Issue Template

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,12 +82,13 @@ The Sarek pipeline comes with documentation in the `docs/` directory:
8282
06. [Configuration and profiles documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONFIG.md)
8383
07. [Intervals documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INTERVALS.md)
8484
08. [Running the pipeline](https://github.com/SciLifeLab/Sarek/blob/master/docs/USAGE.md)
85-
09. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
86-
10. [TSV file documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/TSV.md)
87-
11. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
88-
12. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
89-
13. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
90-
14. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
85+
09. [Command line parameters](https://github.com/SciLifeLab/Sarek/blob/master/docs/PARAMETERS.md)
86+
10. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
87+
11. [Input files documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INPUT.md)
88+
12. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
89+
13. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
90+
14. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
91+
15. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
9192

9293
## Contributions & Support
9394

docs/CONFIG.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ For more informations on how to use configuration files, have a look at the [Nex
55
For more informations about profiles, have a look at the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles)
66

77
We provides several configuration files and profiles for Sarek.
8-
The standard ones are designed to work on a Swedish UPPMAX clusters, and can be modified and tailored to your own need.
8+
The standard ones are designed to work on a Swedish UPPMAX cluster, but can be modified and tailored to your own need.
9+
910

1011
## Configuration files
1112

@@ -51,10 +52,14 @@ To be used for Travis (2 cpus) or on small computer for testing purpose
5152
Slurm configuration for a UPPMAX cluster
5253
Will run the workflow on `/scratch` using the Nextflow [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch) directive
5354

54-
## profiles
55+
## Profiles
56+
A profile is a convenient way of specifying which set of configuration files to use.
57+
The default profile is `standard`, but Sarek has multiple predefined profiles which are listed below that can be specified by specifying `-profile <profile>`:
58+
59+
```bash
60+
nextflow run SciLifeLab/Sarek --sample mysample.tsv -profile myprofile
61+
```
5562

56-
Every profile can be modified for your own use.
57-
To use a profile, you'll need to specify `-profile <profile>`
5863

5964
### `docker`
6065

@@ -82,3 +87,14 @@ Singularity images will be pulled automatically.
8287

8388
This is the profile for Singularity testing on a small machine, or on Travis CI.
8489
Singularity images will be pulled automatically.
90+
91+
## Customisation
92+
The recommended way to use custom settings is to supply Sarek with an additional configuration file. You can use the files in the [`conf/`](https://github.com/SciLifeLab/Sarek/tree/master/conf) directory as an inspiration to make this new `.config` file and specify it using the `-c` flag:
93+
94+
```bash
95+
nextflow run SciLifeLab/Sarek --sample mysample.tsv -c conf/personal.config
96+
```
97+
98+
Any configuration field specified in this file has precedence over the predefined configurations but any field left out from the file will be set by the normal configuration files included in the specified (or `standard`) profile.
99+
100+
Furthermore, to find out which configuration files take action for the different profiles, the profiles are defined in the file [`nextflow.config`](https://github.com/SciLifeLab/Sarek/blob/master/nextflow.config).

docs/TSV.md renamed to docs/INPUT.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Input files for Sarek can be specified using a tsv file given to the `--sample` parameter. The tsv file is a Tab Separated Value file with columns: `subject gender status sample lane fastq1 fastq2` or `subject gender status sample bam bai`.
44
The content of these columns should be quite straight-forward:
55

6-
- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, il could be the Normal ID Sample.
6+
- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, it could be the Normal ID Sample.
77
- `gender` is the gender of the Patient, (XX or XY)
88
- `status` is the status of the Patient, (0 for Normal or 1 for Tumor)
99
- `sample` designate the Sample, it should be the ID of the Sample (it is possible to have more than one tumor sample for each patient)
@@ -57,3 +57,44 @@ All the files will be in he Preprocessing/Recalibrated/ directory, and by defaul
5757
```bash
5858
nextflow run SciLifeLab/Sarek/somaticVC.nf --sample Preprocessing/Recalibrated/mysample.tsv --tools Mutect2,Strelka
5959
```
60+
61+
## Input FASTQ file name best practices
62+
63+
The input folder, containing the FASTQ files for one individual (ID) should be organized into one subfolder for every sample.
64+
All fastq files for that sample should be collected here.
65+
66+
```
67+
ID
68+
+--sample1
69+
+------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
70+
+------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
71+
+------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
72+
+------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
73+
+--sample2
74+
+------sample2_lib_flowcell-index_lane_R1_1000.fastq.gz
75+
+------sample2_lib_flowcell-index_lane_R2_1000.fastq.gz
76+
+--sample3
77+
+------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
78+
+------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
79+
+------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
80+
+------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
81+
```
82+
83+
Fastq filename structure:
84+
85+
- `sample_lib_flowcell-index_lane_R1_1000.fastq.gz` and
86+
- `sample_lib_flowcell-index_lane_R2_1000.fastq.gz`
87+
88+
Where:
89+
90+
- `sample` = sample id
91+
- `lib` = indentifier of libaray preparation
92+
- `flowcell` = identifyer of flow cell for the sequencing run
93+
- `lane` = identifier of the lane of the sequencing run
94+
95+
Read group information will be parsed from fastq file names according to this:
96+
97+
- `RGID` = "sample_lib_flowcell_index_lane"
98+
- `RGPL` = "Illumina"
99+
- `PU` = sample
100+
- `RGLB` = lib

docs/PARAMETERS.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Parameters
2+
3+
A list of all possible parameter that can be used for the different scripts included in Sarek.
4+
5+
## Common for all scripts
6+
7+
### --help
8+
9+
Display help
10+
11+
### --noReports
12+
13+
Disable all QC tools and MultiQC.
14+
15+
### --outDir
16+
17+
Choose an output directory
18+
19+
### --project `ProjectID`
20+
21+
Specify a project number ID on a UPPMAX cluster.
22+
(optional if not on such a cluster)
23+
24+
### --sample `file.tsv`
25+
26+
Use the given TSV file as sample (cf [TSV documentation](TSV.md)).
27+
Is not used for `annotate.nf` and `runMultiQC.nf`.
28+
29+
### --tools `tool1[,tool2,tool3...]`
30+
31+
Choose which tools will be used in the workflow.
32+
Different tools to be separated by commas.
33+
Possible values are:
34+
35+
- haplotypecaller (use `HaplotypeCaller` for VC) (germlineVC.nf)
36+
- manta (use `Manta` for SV) (germlineVC.nf,somaticVC.nf)
37+
- strelka (use `Strelka` for VC) (germlineVC.nf,somaticVC.nf)
38+
- ascat (use `ASCAT` for CNV) (somaticVC.nf)
39+
- mutect2 (use `MuTect2` for VC) (somaticVC.nf)
40+
- snpeff (use `snpEff` for Annotation) (annotate.nf)
41+
- vep (use `VEP` for Annotation) (annotate.nf)
42+
43+
`--tools` option is case insensitive to avoid easy introduction of errors when choosing tools.
44+
So you can write `--tools mutect2,ascat` or `--tools MuTect2,ASCAT` without worrying about case sensitivity.
45+
46+
### --verbose
47+
48+
Display more information about files being processed.
49+
50+
## Preprocessing script (`main.nf`)
51+
### --step `step`
52+
53+
Choose from wich step the workflow will start.
54+
Choose only one step.
55+
Possible values are:
56+
57+
- mapping (default, will start workflow with FASTQ files)
58+
- recalibrate (will start workflow with BAM files and Recalibration Tables
59+
60+
`--step` option is case insensitive to avoid easy introduction of errors when choosing a step.
61+
62+
### --test
63+
64+
Test run Sarek on a smaller dataset, that way you don't have to specify `--sample Sarek-data/testdata/tsv/tiny.tsv`
65+
66+
### --onlyQC
67+
68+
Run only QC tools and MultiQC to generate a HTML report.
69+
70+
71+
## Annotate script (`annotate.nf`)
72+
73+
### --annotateTools `tool1[,tool2,tool3...]`
74+
75+
Choose which tools to annotate.
76+
Different tools to be separated by commas.
77+
Possible values are:
78+
- haplotypecaller (Annotate `HaplotypeCaller` output)
79+
- manta (Annotate `Manta` output)
80+
- mutect2 (Annotate `MuTect2` output)
81+
- strelka (Annotate `Strelka` output)
82+
83+
### --annotateVCF `file1[,file2,file3...]`
84+
85+
Choose vcf to annotate.
86+
Different vcfs to be separated by commas.
87+
88+
89+
## MultiQC script (`runMultiQC.nf`)
90+
### --callName `Name`
91+
92+
Specify a name for MultiQC report (optional)
93+
94+
### --contactMail `email`
95+
96+
Specify an email for MultiQC report (optional)
97+
98+
99+
## References
100+
101+
For most use cases, the reference information is already in the configuration file [`conf/genomes.config`](https://github.com/SciLifeLab/Sarek/blob/master/conf/genomes.config).
102+
However, if needed, you can specify any reference file at the command line.
103+
104+
### --acLoci `acLoci file`
105+
106+
### --bwaIndex `bwaIndex file`
107+
108+
### --cosmic `cosmic file`
109+
110+
### --cosmicIndex `cosmicIndex file`
111+
112+
### --dbsnp `dbsnp file`
113+
114+
### --dbsnpIndex `dbsnpIndex file`
115+
116+
### --genomeDict `genomeDict file`
117+
118+
### --genomeFile `genomeFile file`
119+
120+
### --genomeIndex `genomeIndex file`
121+
122+
### --intervals `intervals file`
123+
124+
### --knownIndels `knownIndels file`
125+
126+
### --knownIndelsIndex `knownIndelsIndex file`
127+
128+
### --snpeffDb `snpeffDb file`
129+
130+
## Hardware Parameters
131+
132+
For most use cases, the reference information is already in the appropriate [configuration files](https://github.com/SciLifeLab/Sarek/blob/master/conf/).
133+
However, it is still possible to specify these parameters at the command line as well.
134+
135+
### --runTime `time`
136+
137+
### --singleCPUMem `memory`
138+
139+
### --totalMemory `memory`

0 commit comments

Comments
 (0)