Skip to content

Commit fa49f46

Browse files
restrict pipeline to use one ip replicate and control replicate, update usage on sample sheet design
1 parent 86bd82c commit fa49f46

3 files changed

Lines changed: 53 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2121
- [[PR #493](https://github.com/nf-core/chipseq/pull/493)] - Follow up to #487.
2222
- [[#492](https://github.com/nf-core/chipseq/issues/492), [#417](https://github.com/nf-core/chipseq/issues/417)] - Refactor local modules to nf-core standard.
2323
- [[#416](https://github.com/nf-core/chipseq/issues/416)] - Moved the KHMER_UNIQUEKMERS logic to prepare_genome
24+
- [[#510](https://github.com/nf-core/chipseq/issues/510)] - Restrict the usage to one IP replicate against one control see: [#440](https://github.com/nf-core/chipseq/issues/440)
25+
replicate.
2426

2527
### Parameters
2628

bin/check_samplesheet.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,9 +212,12 @@ def check_samplesheet(file_in, file_out):
212212
sample,
213213
)
214214

215+
set_control_replicates = set()
215216
for idx, val in enumerate(sample_mapping_dict[sample][replicate]):
216217
control = "_REP".join(val[-1].split("_REP")[:-1])
217218
control_replicate = val[-1].split("_REP")[-1]
219+
set_control_replicates.update(control_replicate)
220+
218221
if control and (
219222
control not in sample_mapping_dict.keys()
220223
or int(control_replicate) not in sample_mapping_dict[control].keys()
@@ -225,6 +228,12 @@ def check_samplesheet(file_in, file_out):
225228
val[-1],
226229
)
227230

231+
# Check that a given sample-replicate have only one control replicate
232+
if len(set_control_replicates) > 1:
233+
print_error(
234+
f"Sample: {sample}, replicate {replicate} has more than one control replicate! Revise the experimental design, see: 'Note on IP and control replicates'"
235+
)
236+
228237
## Write to file
229238
for idx in range(len(sample_mapping_dict[sample][replicate])):
230239
fastq_files = sample_mapping_dict[sample][replicate][idx]

docs/usage.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,48 @@ WT_INPUT,BLA203A30_S21_L002_R1_001.fastq.gz,,2,,,
4747
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
4848
```
4949

50+
### Note on IP and control replicates
51+
52+
The pipeline is designed to handle one IP and matching control replicate, see section above. However there can be
53+
situations where one might want to make multiple comparisons of the IP sample against several different controls. In
54+
those cases it is advisable to encode these comparisons either in the sample column or as another replicate.
55+
56+
- Encoding in sample names:
57+
58+
```csv title="samplesheet.csv"
59+
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
60+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
61+
WT_BCATENIN_IP_CONTROL_2,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2
62+
WT_BCATENIN_IP_CONTROL_3,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,3
63+
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
64+
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
65+
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
66+
```
67+
68+
- Encoding as new biological replicates:
69+
70+
```csv title="samplesheet.csv"
71+
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
72+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
73+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
74+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,3,BCATENIN,WT_INPUT,3
75+
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
76+
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
77+
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
78+
```
79+
80+
- The following design, one IP replicate against more than one control replicate, is not allowed:
81+
82+
```csv title="samplesheet.csv"
83+
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
84+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
85+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2
86+
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,3
87+
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
88+
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
89+
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
90+
```
91+
5092
### Full design
5193

5294
The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 7 columns to match those defined in the table below.

0 commit comments

Comments
 (0)