Skip to content

--macs_gsize option #280

@StevenWingett

Description

@StevenWingett

Following on from a discussion on Slack with Harshil Patel, I wanted to raise an issue regarding the genome size needed for the --macs_gsize option when running the nf-core/chipseq pipeline (and others).

I shall be using genomes not available in iGenomes and so needed to calculate this value myself. To check I could do this correctly, I tried to get the same values for human and mouse as reported at: https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config

To perform the calculation, I ran the script unique-kmers.py as described at:
https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html

My results for human38 (assuming k-mers of 100-bp) were similar to that reported in iGenomes: 2.8e9 vs 2.7e9 respectively.
However, the calculations for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (iGenomes).
(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)

I spoke to the MACS developers and they appear to agree that these values need updating:
macs3-project/MACS#508 (comment)

I believe the relevant nf-core documentation should be updated to show these new values.

(It may be of interest to you to know that I have been putting together a script that automates the reference genome downloading. I shall incorporate the DeepTools kmer estimation of genome size into the automated download process. I shall share this with you when it is ready, incase it is of any use).

Many thanks,

Steven

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions