Skip to content

Open .conda ZipFile once per extract (B13)#318

Draft
jezdez wants to merge 5 commits into
conda:mainfrom
jezdez:jezdez/track-b-b13-reuse-zipfile
Draft

Open .conda ZipFile once per extract (B13)#318
jezdez wants to merge 5 commits into
conda:mainfrom
jezdez:jezdez/track-b-b13-reuse-zipfile

Conversation

@jezdez

@jezdez jezdez commented Apr 27, 2026

Copy link
Copy Markdown
Member

Description

cph.streaming._extract iterates over the two components (pkg and info) of a .conda archive, calling cps.stream_conda_component twice. Each call internally instantiates a fresh zipfile.ZipFile(fileobj), parsing the central-directory end-of-file record from scratch.

This PR opens the ZipFile once in _extract and threads it through both stream_conda_component calls via the new zf= kwarg added in the companion cps PR (conda/conda-package-streaming#173). No semantic change; saves one ZIP-parse per archive.

Depends on conda/conda-package-streaming#173 landing first — without the cps-side zf= kwarg this is a no-op.

Part of the conda-tempo Track B performance work. Tracking issue: conda/conda#15969.

Phase-2 micro-benchmark (bench_s13_zipfile_single.py, 10 .conda archives, both components):

Before After Speedup
10 archives, 2 components each 999 µs 502 µs

Phase-4 end-to-end (as part of the full cps + cph stack):

Workload mac baseline → stacked Linux baseline → stacked
W4 (cold-cache data-sci install) 43.88 s → 36.14 s 26.28 s → 23.38 s

Full research report: https://github.com/jezdez/conda-tempo/blob/main/track-b-transaction.md

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests? (existing extract tests cover the .condapkg + info round-trip; happy to add a regression that asserts only one ZipFile is constructed)
  • Add / update outdated documentation? (no user-facing docs change)

@jezdez jezdez force-pushed the jezdez/track-b-b13-reuse-zipfile branch from feb521a to 42868cd Compare April 27, 2026 08:34
@jezdez jezdez changed the title perf: open .conda ZipFile once per extract (track-b B13) Open .conda ZipFile once per extract (B13) Apr 27, 2026
@github-project-automation github-project-automation Bot moved this to 🆕 New in 🔎 Review Apr 27, 2026
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 27, 2026
jezdez added a commit to jezdez/conda-package-handling that referenced this pull request Apr 27, 2026
jezdez added a commit to jezdez/conda-package-handling that referenced this pull request Apr 27, 2026
@jezdez jezdez force-pushed the jezdez/track-b-b13-reuse-zipfile branch from 133ceb0 to acb77f5 Compare April 27, 2026 08:57
@dholth

dholth commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@jezdez this can be continued now that conda-package-streaming has been updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed [bot] added once the contributor has signed the CLA

Projects

Status: In Progress 🏗️
Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

4 participants