He Yu (a colleague) has developed a small tool to "smush" together genotypes from different libraries of the same individual. Some version of this could be implemented into eager to solve the issues with ssDNA vs dsDNA genotyping with pileupCaller. It would allow users to keep genotypes called with --singleStrandMode but also fill in any missing data that is genotyped in an accompanying dsDNA library of the same individual.
The (soon to be updated) implementation of pileupCaller will be creating two genotyping datasets, one for dsDNA and one for ssDNA libraries. Adding a genotype "smushing" option would allow us to provide a single dataset with a single version for each "duplicated" individual that includes the best version of the data from both library types.
The current implementation of He's tool randomly picks one of the genotypes in positions where both versions of an individual are genotyped. It might be good to add an option to overwrite this behaviour (so dsDNA/ssDNA is preferred if the user has a preference) before it is implemented in eager.
I will look into condaing and implementing the tool in eager after I discuss further with He.
He Yu (a colleague) has developed a small tool to "smush" together genotypes from different libraries of the same individual. Some version of this could be implemented into eager to solve the issues with ssDNA vs dsDNA genotyping with pileupCaller. It would allow users to keep genotypes called with
--singleStrandModebut also fill in any missing data that is genotyped in an accompanying dsDNA library of the same individual.The (soon to be updated) implementation of pileupCaller will be creating two genotyping datasets, one for dsDNA and one for ssDNA libraries. Adding a genotype "smushing" option would allow us to provide a single dataset with a single version for each "duplicated" individual that includes the best version of the data from both library types.
The current implementation of He's tool randomly picks one of the genotypes in positions where both versions of an individual are genotyped. It might be good to add an option to overwrite this behaviour (so dsDNA/ssDNA is preferred if the user has a preference) before it is implemented in eager.
I will look into
condaing and implementing the tool in eager after I discuss further with He.