Skip to content

Reduce memory consumption of very high-resolution merges#408

Merged
nvictus merged 7 commits intoopen2c:masterfrom
nvictus:fix-hires-merge
Mar 23, 2024
Merged

Reduce memory consumption of very high-resolution merges#408
nvictus merged 7 commits intoopen2c:masterfrom
nvictus:fix-hires-merge

Conversation

@nvictus
Copy link
Copy Markdown
Member

@nvictus nvictus commented Mar 20, 2024

This PR addresses a high memory consumption issue when a large number of very high resolution coolers are merged. It should improve the performance not only of cooler merge but also of cooler cload pairs and cooler load.

In pre-calculating offsets to use for the merge execution plan, we were loading (and concatenating) all bin1_offset indexes into memory. This isn't an issue for typical coolers, but can become prohibitively large for many inputs at high resolutions, where a single index vector can be ~2GB in size at human 10bp resolution.

  • Now we use lazy HDF5 datasets and load each bin1_offset index incrementally during merge execution planning. This results in a drastic improvement for merges involving e.g. 100s of datasets.
  • We also expose the merge buffer argument to cooler cload pairs and cooler load, and the max-merge option to cooler load, to give the user more flexibility in controlling maximum memory consumption during the actual merge epochs.

@nvictus nvictus requested review from Phlya and thomas-reimonn March 20, 2024 23:24
@nvictus nvictus merged commit a1b6cb0 into open2c:master Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant