-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
70 lines (61 loc) · 3.38 KB
/
README
File metadata and controls
70 lines (61 loc) · 3.38 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
CaTCHseq pipeline
================
Overview
--------
The CaTCHseq pipeline processes PCR-amplified CaTCH libraries from FASTQ files through mapping, barcode collapsing/deduplication, multiplet handling, and reporting. It is built with Nextflow (DSL2) and supports CellRanger or STARsolo mapping. Deduplication leverages umi_tools network-based clustering with tunable strategies and distances for CaTCH barcodes and UMIs.
Quick start (Docker-based defaults)
-----------------------------------
```bash
nextflow run nextflow/main.nf \
--libraries <libraries.csv> \
--outputDir ./CaTCHseq_OUTPUT \
--reportsDir ./REPORTS
```
Key inputs
----------
- `--libraries` (CSV, required): columns SampleName, Condition, Replicate, LibraryType (GEX|CaTCHseq), R1, R2, CellNumber, Chemistry.
- Reference inputs vary by mapper:
- CellRanger: `--index` (transcriptome), `--reference`, `--annotation` as needed for building indexes.
- STARsolo: `--index`, `--reference`, `--annotation`, optional `--whitelist`.
Major parameters
----------------
- General
- `--mapper` (CellRanger|STAR) and mapper-specific params `--cellranger_params`, `--star_params`, `--idx_params`.
- `--withQC` to run FastQC/MultiQC; `--fastqc_params` for extra options.
- `--chunkSize` read chunking for barcode counting.
- `--minReads` minimum reads per CaTCH barcode; `--filter` to use filtered counts.
- `--min_detected_barcodes`, `--singlet_cutoff`, `--bc1_cutoff`, `--bc2_cutoff` for downstream classification.
- Barcode/UMI collapsing (umi_tools-backed)
- Distance cutoffs:
- `--maxDist` global fallback (default 1).
- `--maxDistCaTCH` distance for CaTCH barcode collapsing (default 1; falls back to `--maxDist`).
- `--maxDistUMIs` distance for UMI collapsing (default 1; falls back to `--maxDist`).
- Network methods:
- `--clusterMethodCaTCH` (directional/adjacency/cluster, default directional).
- `--clusterMethodUMIs` (directional/adjacency/cluster, default directional).
- Uniqueness toggle: `--uniqueCaTCH` (true/false) to enable umi_tools-based collapsing; otherwise Hamming distance.
Pipeline steps (high level)
---------------------------
1) **QC** (optional): FastQC → MultiQC summaries.
2) **Mapping**: CellRanger count or STARsolo (with chemistry-specific presets).
3) **Barcode counting**: Count CaTCH barcodes in chunks and merge.
4) **Collapse & filter**: Apply distance/method settings to deduplicate CaTCH barcodes and UMIs; remove background.
5) **Multiplet resolution**: Majority-vote based merging of multiplets.
6) **Reports & tables**: Generate CaTCH barcode and cell summaries plus analytics plots.
CLI mapping for collapse script
-------------------------------
- Nextflow params map to `collapseCaTCHbarcodes.py`:
- `--clusterMethodCaTCH` → `--cluster-method-catch`
- `--clusterMethodUMIs` → `--cluster-method-umis`
- `--maxDistCaTCH` → `--maxdist-catch`
- `--maxDistUMIs` → `--maxdist-umis`
- All default to the original behaviour (directional, distance 1) if not set.
Outputs
-------
- `OUTPUT/Counts/` – intermediate and collapsed `.sclib` libraries and stats.
- `OUTPUT/Reports/` – tables (`*.CaTCHbarcodes`, `*.cells`) and plots.
- `OUTPUT/CellRanger/` or `OUTPUT/STAR/` – mapper-specific outputs.
Tips
----
- Provide chemistry-specific parameters for STARsolo if deviating from 10X presets.
- Ensure `--libraries` paths are accessible where Nextflow executes (local or workdir-mounted in containers).