Merge pull request #175 from nf-core/docs-profile-update

apeltzer · web-flow · commit 283b7fe10aa0 · 2019-03-15T10:00:44.000+01:00
Profile description improvement
diff --git a/docs/configuration/adding_your_own.md b/docs/configuration/adding_your_own.md
@@ -1,14 +1,22 @@
 # nf-core/eager: Configuration for other clusters
 
-It is entirely possible to run this pipeline on other clusters, though you will need to set up your own config file so that the pipeline knows how to work with your cluster.
+## Introduction
 
-> If you think that there are other people using the pipeline who would benefit from your configuration (eg. other common cluster setups), please let us know. We can add a new configuration and profile which can used by specifying `-profile <name>` when running the pipeline.
+It is entirely possible to run this pipeline on your own clusters, though you will need to set up your own config file so that the pipeline knows how to work with your cluster.
+
+### Personal Profiles
 
 If you are the only person to be running this pipeline, you can create your config file as `~/.nextflow/config` and it will be applied every time you run Nextflow. Alternatively, save the file anywhere and reference it when running the pipeline with `-c path/to/config` (see the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more).
 
 A basic configuration comes with the pipeline, which runs by default (the `standard` config profile - see [`conf/base.config`](../conf/base.config)). This means that you only need to configure the specifics for your system and overwrite any defaults that you want to change.
 
-## Cluster Environment
+### Institute Profiles
+
+In contrast, if you think that there are other people using the pipeline who would benefit from your configuration (e.g. other common cluster setups), you can create a config adapted to that cluster and is centrally stored and maintained at [nf-core/configs](https://github.com/nf-core/configs). Then, you can specify `-profile <institute_name>` when running the pipeline without making your own custom config file. Furthermore, the same profile can be used for other nf-core pipelines.
+
+## Creating your own profile
+
+### Cluster Environment
 By default, pipeline uses the `local` Nextflow executor - in other words, all jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node.
 
 To specify your cluster environment, add the following line to your config file:
@@ -27,11 +35,11 @@ process {
   clusterOptions = '-A myproject'
 }
 ```
-## Software Requirements
+### Software Requirements
 To run the pipeline, several software packages are required. How you satisfy these requirements is essentially up to you and depends on your system. If possible, we _highly_ recommend using either Docker or Singularity.
 Please see the [`installation documentation`](../installation.md) for how to run using the below as a one-off. These instructions are about configuring a config file for repeated use.
 
-### Docker
+#### Docker
 Docker is a great way to run nf-core/eager, as it manages all software installations and allows the pipeline to be run in an identical software environment across a range of systems.
 
 Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required - nextflow will automatically fetch the [nfcore/eager](https://hub.docker.com/r/nfcore/eager/) image that we have created and is hosted at dockerhub at run time.
@@ -46,7 +54,7 @@ process.container = "nfcore/eager"
 Note that the dockerhub organisation name annoyingly can't have a hyphen, so is `nfcore` and not `nf-core`.
 
 
-### Singularity image
+#### Singularity image
 Many HPC environments are not able to run Docker due to security issues.
 [Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker.
 
@@ -75,15 +83,15 @@ process.container = "/path/to/nf-core-eager.simg"
 By default nextflow will store a singularity image in the working directory of a job. You can alternatively further specify a 'central' singularity cache to keep all singularity contains for a(ll) user(s). This can be
 done by either setting a central environmental variable `NXF_SINGULARITY_CACHEDIR` or specifying the location in a nextflow config file with `singularity.cacheDir`.
 
-### Conda
+#### Conda
 If you're not able to use Docker or Singularity, you can instead use conda to manage the software requirements.
 To use conda in your own config file, add the following:
 
 ```nextflow
 process.conda = "$baseDir/environment.yml"
 ```
 
-## Software Caches
+### Software Caches
 
 Each new version of a pipeline downloaded and ran, will pull down a new image (docker/singularity)/collection (conda) of all the software required for the pipeline. By default this will be placed in the `work/` directory of an EAGER run. When running lots of pipeline jobs, this can slow down the pipeline (having to create a download a new environment each time) and take up a lot of hard-disk space (as each run has it's own duplicate of the environment).
 
@@ -104,7 +112,8 @@ conda {
 }
 ```
 
-## Job Resources
+### Job Resources
+
 #### Automatic resubmission
 Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.
 
diff --git a/docs/installation.md b/docs/installation.md
@@ -10,8 +10,7 @@ To start using the nf-core/eager pipeline, follow the steps below:
 3. [Pipeline configuration](#3-pipeline-configuration)
     * [Software deps: Docker and Singularity](#31-software-deps-docker-and-singularity)
     * [Software deps: Bioconda](#32-software-deps-bioconda)
-    * [Configuration profiles](#33-configuration-profiles)
-4. [Reference genomes](#4-reference-genomes)
+4. [Terminal configuration](#4-terminal-configuration)
 5. [Appendices](#appendices)
     * [Running on UPPMAX](#running-on-uppmax)
 
@@ -34,10 +33,10 @@ See [nextflow.io](https://www.nextflow.io/) for further instructions on how to i
 
 ## 2) Install the pipeline
 
-#### 2.1) Automatic
+### 2.1) Automatic
 This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub if `nf-core/eager` is specified as the pipeline name.
 
-#### 2.2) Offline
+### 2.2) Offline
 The above method requires an internet connection so that Nextflow can download the pipeline files. If you're running on a system that has no internet connection, you'll need to download and transfer the pipeline files manually:
 
 ```bash
@@ -54,7 +53,7 @@ To stop nextflow from looking for updates online, you can tell it to run in offl
 export NXF_OFFLINE='TRUE'
 ```
 
-#### 2.3) Development
+### 2.3) Development
 
 If you would like to make changes to the pipeline, it's best to make a fork on GitHub and then clone the files. Once cloned you can run the pipeline directly as above.
 
@@ -81,13 +80,15 @@ The following software is currently required to be installed:
 * [GATK](https://software.broadinstitute.org/gatk/)
 * [bamUtil](https://genome.sph.umich.edu/wiki/BamUtil)
 * [fastP](https://github.com/OpenGene/fastp)
+* [DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)
 
-#### 3.1) Software deps: Docker
+
+### 3.1) Software deps: Docker
 First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)
 
 Then, running the pipeline with the option `-profile standard,docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/eager).
 
-#### 3.1) Software deps: Singularity
+### 3.2) Software deps: Singularity
 If you're not able to use Docker then [Singularity](http://sylabs.io) is a great alternative.
 The process is very similar: running the pipeline with the option `-profile standard,singularity` tells Nextflow to enable singularity for this run. An image containing all of the software requirements will be automatically fetched and used from singularity hub.
 
@@ -106,13 +107,13 @@ nextflow run /path/to/nf-core-eager -with-singularity nf-core-eager.simg
 Remember to pull updated versions of the singularity image if you update the pipeline.
 
 
-#### 3.2) Software deps: conda
+### 3.3) Software deps: conda
 If you're not able to use Docker _or_ Singularity, you can instead use conda to manage the software requirements.
 This is slower and less reproducible than the above, but is still better than having to install all requirements yourself!
 The pipeline ships with a conda environment file and nextflow has built-in support for this.
 To use it first ensure that you have conda installed (we recommend [miniconda](https://conda.io/miniconda.html)), then follow the same pattern as above and use the flag `-profile standard,conda`
 
-#### 4) Profile configuration
+## 4) Terminal configuration
 Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler.
 
 It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`):
diff --git a/docs/usage.md b/docs/usage.md
@@ -29,16 +29,16 @@ screen -r eager2
 ```
 to end the screen session while in it type `exit`.
 
-It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`):
 
-```bash
-NXF_OPTS='-Xms1g -Xmx4g'
-```
 ## Help Message
 To access the nextflow help message run: `nextflow run -help`
 
 ## Running the pipeline
+
+> Before you start you should change into the output directory you wish your results to go in. When you start the nextflow job, it will place all the 'working' folders in the current directory and NOT necessarily the directory the output files will be in.
+
 The typical command for running the pipeline is as follows:
+
 ```bash
 nextflow run nf-core/eager --reads '*_R{1,2}.fastq.gz' --fasta 'some.fasta' -profile standard,docker
 ```
@@ -75,10 +75,14 @@ This version number will be logged in reports when you run the pipeline, so that
 
 ### `-profile`
 
-Use this parameter to choose a configuration profile. Profiles can give configuration presets for different computing environments. Note that multiple profiles can be loaded, for example: `-profile standard,docker` - the order of arguments is important!
+Use this parameter to choose a configuration profile. Profiles can give configuration presets for different computing environments (e.g. schedulers, software environments, memory limits etc). Note that multiple profiles can be loaded, for example: `-profile standard,docker` - the order of arguments is important! The first entry takes precendence over the others, e.g. if a setting is set by both the first and second profile, the first entry will be used and the second entry ignored. 
+
+> *Important*: If running EAGER2 on a cluster - ask your system administrator what profile to use.
+
+For more details on how to set up your own private profile, please see [installation](../configuration/adding_your_own.md).
 
 **Basic profiles**
-These are basic profiles which primarily define where you derive the pipeline's software packages from. These are typically the profiles you would use if you are running the pipeline on your own PC (vs. a HPC cluster).
+These are basic profiles which primarily define where you derive the pipeline's software packages from. These are typically the profiles you would use if you are running the pipeline on your **own PC** (vs. a HPC cluster - see below).
 
 * `standard`
     * The default profile, used if `-profile` is not specified at all.
@@ -99,9 +103,9 @@ These are basic profiles which primarily define where you derive the pipeline's
     * Includes links to test data so needs no other parameters
 * `none`
     * No configuration at all. Useful if you want to build your own config from scratch and want to avoid loading in the default `base` config profile (not recommended).
-    
+ 
 **Institution Specific Profiles**
-These are profiles specific to certain clusters, and are centrally  maintained at [nf-core/configs](`https://github.com/nf-core/configs`). Those listed below are regular users of EAGER2, if you don't see your own institution here check the [nf-core/configs](`https://github.com/nf-core/configs`) repository.
+These are profiles specific to certain **HPC clusters**, and are centrally maintained at [nf-core/configs](https://github.com/nf-core/configs). Those listed below are regular users of EAGER2, if you don't see your own institution here check the [nf-core/configs](https://github.com/nf-core/configs) repository.
 
 * `uzh`
     * A profile for the University of Zurich Research Cloud
@@ -113,6 +117,8 @@ These are profiles specific to certain clusters, and are centrally  maintained a
    * A profiler for the SDAG cluster at the Department of Archaeogenetics of the Max-Planck-Institute for the Science of Human History
    * Loads Singularity and defines appropriate resources for running the pipeline
 
+    
+
 ### `--reads`
 Use this to specify the location of your input FastQ files. The files maybe either from a single, or multiple samples. For example: