Extract IO errors from h2 for streaming retries of Connection Reset#15675
Merged
Conversation
Our streaming retries were missing connection reset errors as h2 was shadowing IO errors (hyperium/h2#862). **Test plan** ``` cargo python uninstall 3.12 && cargo run python install 3.12 -vv ``` In another: ``` sudo tcpkill -i wlp2s0 port 443 ``` Output: ``` error: Failed to install cpython-3.12.11-linux-x86_64-gnu Caused by: Request failed after 3 retries Caused by: Failed to download https://github.com/astral-sh/python-build-standalone/releases/download/20250902/cpython-3.12.11%2B20250902-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz Caused by: error sending request for url (https://github.com/astral-sh/python-build-standalone/releases/download/20250902/cpython-3.12.11%2B20250902-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz) Caused by: client error (SendRequest) Caused by: connection error Caused by: connection reset ``` I don't know how to test that from inside Rust. Fix #14171 (again, hopefully)
tmeijn
pushed a commit
to tmeijn/dotfiles
that referenced
this pull request
Sep 12, 2025
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.8.15` -> `0.8.17` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.8.17`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0817) [Compare Source](astral-sh/uv@0.8.16...0.8.17) Released on 2025-09-10. ##### Enhancements - Improve error message for HTTP validation in auth services ([#​15768](astral-sh/uv#15768)) - Respect `PYX_API_URL` when suggesting `uv auth login` on 401 ([#​15774](astral-sh/uv#15774)) - Add pyx as a supported PyTorch index URL ([#​15769](astral-sh/uv#15769)) ##### Bug fixes - Avoid initiating login flow for invalid API keys ([#​15773](astral-sh/uv#15773)) - Do not search for a password for requests with a token attached already ([#​15772](astral-sh/uv#15772)) - Filter pre-release Python versions in `uv init --script` ([#​15747](astral-sh/uv#15747)) ### [`v0.8.16`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0816) [Compare Source](astral-sh/uv@0.8.15...0.8.16) ##### Enhancements - Allow `--editable` to override `editable = false` annotations ([#​15712](astral-sh/uv#15712)) - Allow `editable = false` for workspace sources ([#​15708](astral-sh/uv#15708)) - Show a dedicated error for virtual environments in source trees on build ([#​15748](astral-sh/uv#15748)) - Support Android platform tags ([#​15646](astral-sh/uv#15646)) - Support iOS platform tags ([#​15640](astral-sh/uv#15640)) - Support scripts with inline metadata in `--with-requirements` and `--requirements` ([#​12763](astral-sh/uv#12763)) ##### Preview features - Support `--no-project` in `uv format` ([#​15572](astral-sh/uv#15572)) - Allow `uv format` in unmanaged projects ([#​15553](astral-sh/uv#15553)) ##### Bug fixes - Avoid erroring when `match-runtime` target is optional ([#​15671](astral-sh/uv#15671)) - Ban empty usernames and passwords in `uv auth` ([#​15743](astral-sh/uv#15743)) - Error early for parent path in build backend ([#​15733](astral-sh/uv#15733)) - Retry on IO errors during HTTP/2 streaming ([#​15675](astral-sh/uv#15675)) - Support recursive requirements and constraints inclusion ([#​15657](astral-sh/uv#15657)) - Use token store credentials for `uv publish` ([#​15759](astral-sh/uv#15759)) - Fix virtual environment activation script compatibility with latest nushell ([#​15272](astral-sh/uv#15272)) - Skip Python interpreters that cannot be queried with permission errors ([#​15685](astral-sh/uv#15685)) ##### Documentation - Clarify that `uv auth` commands take a URL ([#​15664](astral-sh/uv#15664)) - Improve the CLI help for options that accept requirements files ([#​15706](astral-sh/uv#15706)) - Adds example for caching for managed Python downloads in Docker builds ([#​15689](astral-sh/uv#15689)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS45OC4xIiwidXBkYXRlZEluVmVyIjoiNDEuOTkuNiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
swoboda1337
added a commit
to swoboda1337/esphome
that referenced
this pull request
Feb 10, 2026
Upgrade uv from 0.6.14 to 0.10.1 to pick up the fix for HTTP/2 connection reset retry handling (astral-sh/uv#15675, released in 0.8.16). Also set UV_HTTP_RETRIES=10 (default 3) to better handle transient network errors during PlatformIO penv bootstrap. Remove the UV_CACHE_DIR override since pioarduino now handles this upstream (pioarduino/platform-espressif32#386). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
17 tasks
danielhanchen
added a commit
to unslothai/unsloth
that referenced
this pull request
Jun 16, 2026
…oad failures (#6281) * Make Studio installer resilient to transient uv download failures Updating an existing Studio install via install.sh could hard-fail and roll back when a wheel download (torch, unsloth) hit a transient connection reset: x Failed to download unsloth==2026.6.6 error decoding response body -> error reading a body from connection -> connection reset restoring previous environment after failed install... Root cause: that error chain is a mid-stream HTTP/2 body read failure. uv did not retry this class until 0.8.16 (astral-sh/uv#15675, h2 was shadowing the underlying IO error), but the installer pinned UV_MIN_VERSION=0.7.22, so a stale uv got zero retries and a single blip aborted the whole update under set -e. Fix (installer only, backwards compatible, no change on success): - Raise UV_MIN_VERSION to 0.8.16 so stale uv is upgraded to a version that retries HTTP/2 streaming body errors. - Export UV_HTTP_RETRIES=5 and UV_HTTP_TIMEOUT=180 (override-preserving :=). - Add run_install_cmd_retry (retry-with-backoff around run_install_cmd) and use it for the network-heavy uv pip install steps (torch, unsloth, unsloth-zoo from git, ROCm torch repair, no-torch runtime deps). Local editable overlays and venv creation are left to fail fast. run_install_cmd_retry preserves the final exit code on permanent failure, so the existing set -e rollback trap still fires. * Apply the same transient-download resilience to the Windows installer install.ps1 is the native-Windows installer and had the identical issue as install.sh: it pinned $UvMinVersion=0.7.22 (below uv 0.8.16, which is where uv started retrying HTTP/2 streaming body errors), set no UV_HTTP_* defaults, and ran each 'uv pip install' once via Invoke-InstallCommand, so a single connection reset aborted the update and triggered the Exit-InstallFailure rollback. install.ps1: - Raise $UvMinVersion to 0.8.16. - Default $env:UV_HTTP_RETRIES=5 and $env:UV_HTTP_TIMEOUT=180 (preserving overrides). - Add Invoke-InstallCommandRetry and use it for the network-heavy uv pip install steps (torch, unsloth, unsloth-zoo from git, ROCm torch, no-torch runtime deps). Local editable overlays and venv creation stay single-shot. install.sh: - Align UNSLOTH_INSTALL_RETRIES sanitization with the PowerShell version: a non-positive-integer value now falls back to the default of 3 instead of silently disabling retries (set =1 to disable). Keeps both installers identical. * Adopt pre-marker Studio llama.cpp and sidecar dirs on update After the uv retry fix, an update now reaches studio/setup.sh, whose Studio-owned ownership guard rejects a llama.cpp or sidecar venv created by an earlier install that predates the .unsloth-studio-owned marker: ERROR: .../llama.cpp already exists and is not marked as a Studio-owned llama.cpp install. The marker and UNSLOTH_PREBUILT_INFO.json were introduced in the same commit, so a directory from before that point carries neither signal and a legitimate self-update fails for anyone who installed earlier (reported on issue #6274). Fold a one-time adoption into _assert_studio_owned_or_absent (setup.sh) and Assert-StudioOwnedOrAbsent (setup.ps1): when a custom-home directory lacks the marker, backfill it and proceed only when there is positive evidence it belongs to an established Studio home -- the directory carries UNSLOTH_PREBUILT_INFO.json, or STUDIO_HOME already holds Studio's CLI shim or studio.conf from a prior run. Both installers write the shim and studio.conf only after invoking setup, so a fresh install into a dirty custom home (the case the guard protects) does not have them yet and is still rejected. The venv marker is excluded because install writes it before setup and so cannot tell a prior install from a fresh one. * Review fixes: restrict llama.cpp adoption to dir-local evidence; restore install.sh +x Addresses the PR review on the marker-migration change. P1 - the adoption helper keyed on root-level Studio sentinels ($STUDIO_HOME/bin /unsloth, share/studio.conf), so once a home was recognized every unmarked child passed to the guard became adoptable, and an unrelated directory at a Studio-managed path could be silently marked and overwritten. Base adoption on evidence inside the directory instead: - UNSLOTH_PREBUILT_INFO.json, written by the prebuilt llama.cpp installer (the default path, in place well before the marker), or - a top-level llama-quantize symlink, written by source builds (a plain llama.cpp checkout keeps the binary under build/bin, not a root symlink). A foreign llama.cpp now stays rejected even inside an established Studio home, and sidecar venvs (no such fingerprint) stay subject to the strict guard; their marker has been written since the guard was introduced, so a real custom install already carries it. P2 - restore the executable bit on install.sh; a stray mode change to 100644 would break ./install.sh --local on Unix. On Windows the prebuilt metadata is the signal; source builds are git checkouts indistinguishable from a user clone, so they are left to the strict guard. * Bound UNSLOTH_INSTALL_RETRIES / _DELAY before numeric use An oversized all-digit override (e.g. a fat-fingered "99999999999999999999") passed the digit-only validation and then reached the numeric comparison: POSIX `[ -ge ]` errored with "Illegal number" mid-loop and could spin instead of falling back, and PowerShell's `[int]` cast threw an Int32 overflow under $ErrorActionPreference = "Stop" before any install ran. Sanitize with a length guard + range check (sh) and [int]::TryParse with bounds (ps1), so out-of-range or oversized values fall back to the default. Bounds: 1..100 retries, 0..3600s base delay. * Studio installers: scope llama.cpp adoption to prebuilt metadata; reject leading-zero retry delay setup.sh: drop the top-level llama-quantize symlink as an ownership-adoption signal, leaving UNSLOTH_PREBUILT_INFO.json as the sole fingerprint. The shared ownership guard runs immediately before a destructive replace / rm -rf, and a bare root llama-quantize symlink is user-creatable (a user can keep their own llama.cpp build with such a convenience symlink at a custom UNSLOTH_STUDIO_HOME), so the old check could adopt and then delete a user directory. This matches the Windows installer, which already keeps markerless source builds strict. Pre-marker prebuilt installs still adopt via the metadata file, so the original update fix is preserved. install.sh: reject leading-zero values for UNSLOTH_INSTALL_RETRY_DELAY. A value like 08 or 09 passed the range check but then hit the backoff doubling $((_ricr_delay * 2)), where a non-octal leading zero is a fatal arithmetic error mid-retry. The 0?* pattern routes such values to the default; bare 0 stays valid. * Tighten the comments added in this PR * Condense the comments in this PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Our streaming retries were missing connection reset errors as h2 was shadowing IO errors (hyperium/h2#862).
Test plan
In one terminal:
In another:
Output:
I don't know how to test that from inside Rust.
Fix #14171 (again, hopefully)