Discussion: Genotyping with pileupcaller

It seems to me like the current implementation is aiming to genotype each sample individually, producing _n_ genotype datasets. Is that so?

This approach would not be ideal for end users, since they would then need to merge all these datasets together, something that is usually done in pairs of datasets. This would mean that merging the genotypes of _n_ individuals would require an additional _n-1_ sequential merging jobs that are not within eager.

On the other hand, putting all individuals together to genotype would prohibit running single stranded and double stranded libraries together, since pileupCaller's `--singleStrandMode` applies to the entire set of samples being genotyped. 

Instead of abandoning the user to run multiple extra jobs, or running those jobs for them in the background (which, if even possible, would increase runtime considerably since they are not entirely parallelisable), or abandoning the advantages of `--singleStrandMode`, I propose we either:
 a) Do not merge single- and double-stranded libraries from the same sample into a single bam file, and genotype each group separately. We can then provide the user with two separate genotype datasets (one for single- and one for double-stranded libraries, even if a version of an individual's data are in both datasets).  
 b) Block users from submitting batches with both single and double stranded libraries as a whole. This is the easiest option to implement, but also the least useful.

Any other ideas? Maybe I am overlooking something?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Genotyping with pileupcaller #458

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: Genotyping with pileupcaller #458

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions