Following on from a discussion on Slack with Harshil Patel, I wanted to raise an issue regarding the genome size needed for the --macs_gsize option when running the nf-core/chipseq pipeline (and others).
I shall be using genomes not available in iGenomes and so needed to calculate this value myself. To check I could do this correctly, I tried to get the same values for human and mouse as reported at: https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config
To perform the calculation, I ran the script unique-kmers.py as described at:
https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html
My results for human38 (assuming k-mers of 100-bp) were similar to that reported in iGenomes: 2.8e9 vs 2.7e9 respectively.
However, the calculations for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (iGenomes).
(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)
I spoke to the MACS developers and they appear to agree that these values need updating:
macs3-project/MACS#508 (comment)
I believe the relevant nf-core documentation should be updated to show these new values.
(It may be of interest to you to know that I have been putting together a script that automates the reference genome downloading. I shall incorporate the DeepTools kmer estimation of genome size into the automated download process. I shall share this with you when it is ready, incase it is of any use).
Many thanks,
Steven
Following on from a discussion on Slack with Harshil Patel, I wanted to raise an issue regarding the genome size needed for the --macs_gsize option when running the nf-core/chipseq pipeline (and others).
I shall be using genomes not available in iGenomes and so needed to calculate this value myself. To check I could do this correctly, I tried to get the same values for human and mouse as reported at: https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config
To perform the calculation, I ran the script unique-kmers.py as described at:
https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html
My results for human38 (assuming k-mers of 100-bp) were similar to that reported in iGenomes: 2.8e9 vs 2.7e9 respectively.
However, the calculations for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (iGenomes).
(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)
I spoke to the MACS developers and they appear to agree that these values need updating:
macs3-project/MACS#508 (comment)
I believe the relevant nf-core documentation should be updated to show these new values.
(It may be of interest to you to know that I have been putting together a script that automates the reference genome downloading. I shall incorporate the DeepTools kmer estimation of genome size into the automated download process. I shall share this with you when it is ready, incase it is of any use).
Many thanks,
Steven