Fix self-sustaining WebRTC reconnect storm in remote transport#1888
Merged
Conversation
The remote WebRTC transport could enter a perpetual reconnect loop in which a healthy connection (data channel open, ICE connected) was torn down and rebuilt roughly every 2s and never stabilised. On iOS PWAs this showed up as an endless "connecting → connected → reconnecting" cycle after backgrounding the app. Root cause was three compounding issues in WebRTCTransport: - cleanup() closed the data channel with its onclose handler still attached, so tearing down to reconnect scheduled another reconnect - the engine of the loop. - connect()/scheduleReconnect() were not serialised, so multiple recovery drivers (data channel close, peer-disconnected, connection failure) ran concurrently and corrupted the shared peer connection and SDP state (InvalidModificationError on setLocalDescription). - reconnectAttempts reset on every successful connect, so the backoff never escalated during a flapping loop and stayed at a flat ~1.5s. Changes: - Detach data channel and peer connection handlers before closing them in cleanup() so teardown no longer re-triggers recovery. - Serialise connect() with an in-flight guard and bail from scheduleReconnect() when a reconnect is pending or a connect is in flight. - Defer the backoff reset until the connection has been stable for 5s and add jitter to the reconnect delay. - Treat only the terminal ICE `failed` state as a failure; let the transient `disconnected` state self-heal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aab7587 to
c938bbe
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a self-sustaining reconnect loop (“reconnect storm”) in the remote WebRTC transport by preventing teardown from re-triggering recovery, serializing connection attempts to avoid SDP/PC state corruption, and making reconnect backoff behave correctly under flapping connections.
Changes:
- Detach DataChannel/PeerConnection handlers before closing in
cleanup()to avoid teardown re-entering reconnection logic. - Serialize
connect()and gatescheduleReconnect()to prevent concurrent negotiations and duplicate reconnect scheduling. - Improve reconnect backoff behavior by adding jitter and only resetting the backoff after the connection remains stable for 5 seconds; also treat ICE
failed(notdisconnected) as the failure signal.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses review feedback: cleanup() didn't cancel stableConnectionTimer, so on a non-reconnect teardown (reconnect disabled, or peer-disconnect via the non-reconnect path) the pending backoff reset could fire after the transport was closed and kept the instance referenced until it did. Clear it in cleanup() - the single teardown funnel - and drop the now-redundant clear in disconnect(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses review feedback: the stable-connection timer was cleared only after the max-attempts guard, so with a finite maxReconnectAttempts a drop within the 5s stability window could give up (FAILED) yet leave the timer running, which then reset reconnectAttempts to 0 and bypassed the max. Clear it at the top of scheduleReconnect() so it's cancelled on every path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
github-actions Bot
pushed a commit
to music-assistant/server
that referenced
this pull request
Jun 8, 2026
Update music-assistant-frontend to version [2.17.181](https://github.com/music-assistant/frontend/releases/tag/2.17.181) - Fix queue items disappearing in fullscreen player (by @MarvinSchenkel in [#1874](music-assistant/frontend#1874)) - Add translation key for now playing badge (by @MarvinSchenkel in [#1889](music-assistant/frontend#1889)) ## 🚀 Features and enhancements - Refactor heart icon and add it to the artist page (by @stvncode in [#1891](music-assistant/frontend#1891)) - Add back subtitle for discover page (by @stvncode in [#1890](music-assistant/frontend#1890)) - Bigger tiles on mobile (by @stvncode in [#1887](music-assistant/frontend#1887)) ## 🐛 Bugfixes - Fix self-sustaining WebRTC reconnect storm in remote transport (by @MarvinSchenkel in [#1888](music-assistant/frontend#1888)) - Subtle placeholder for both dark and light mode (by @stvncode in [#1886](music-assistant/frontend#1886)) ## 🧰 Maintenance and dependency bumps <details> <summary>8 changes</summary> - Add built-in playlists for favorites and random tracks (by @OzGav in [#1876](music-assistant/frontend#1876)) - Chore(deps): Bump ludeeus/action-require-labels from 1.1.0 to 2.0.0 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1878](music-assistant/frontend#1878)) - Chore(deps): Bump happy-dom from 20.9.0 to 20.10.2 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1879](music-assistant/frontend#1879)) - Chore(deps): Bump marked from 18.0.3 to 18.0.5 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1880](music-assistant/frontend#1880)) - Chore(deps-dev): Bump @types/node from 25.9.1 to 25.9.2 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1881](music-assistant/frontend#1881)) - Chore(deps): Bump @tanstack/vue-form from 1.32.0 to 1.33.0 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1882](music-assistant/frontend#1882)) - Chore(deps): Bump vue-i18n from 11.4.4 to 11.4.5 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1883](music-assistant/frontend#1883)) - Chore(deps): Bump reka-ui from 2.9.8 to 2.9.9 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [#1884](music-assistant/frontend#1884)) </details> ## 🙇 Contributors @MarvinSchenkel, @OzGav and @stvncode Co-authored-by: stvncode <25082266+stvncode@users.noreply.github.com>
anatosun
pushed a commit
to anatosun/music-assistant-server
that referenced
this pull request
Jun 14, 2026
Update music-assistant-frontend to version [2.17.181](https://github.com/music-assistant/frontend/releases/tag/2.17.181) - Fix queue items disappearing in fullscreen player (by @MarvinSchenkel in [music-assistant#1874](music-assistant/frontend#1874)) - Add translation key for now playing badge (by @MarvinSchenkel in [music-assistant#1889](music-assistant/frontend#1889)) ## 🚀 Features and enhancements - Refactor heart icon and add it to the artist page (by @stvncode in [music-assistant#1891](music-assistant/frontend#1891)) - Add back subtitle for discover page (by @stvncode in [music-assistant#1890](music-assistant/frontend#1890)) - Bigger tiles on mobile (by @stvncode in [music-assistant#1887](music-assistant/frontend#1887)) ## 🐛 Bugfixes - Fix self-sustaining WebRTC reconnect storm in remote transport (by @MarvinSchenkel in [music-assistant#1888](music-assistant/frontend#1888)) - Subtle placeholder for both dark and light mode (by @stvncode in [music-assistant#1886](music-assistant/frontend#1886)) ## 🧰 Maintenance and dependency bumps <details> <summary>8 changes</summary> - Add built-in playlists for favorites and random tracks (by @OzGav in [music-assistant#1876](music-assistant/frontend#1876)) - Chore(deps): Bump ludeeus/action-require-labels from 1.1.0 to 2.0.0 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1878](music-assistant/frontend#1878)) - Chore(deps): Bump happy-dom from 20.9.0 to 20.10.2 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1879](music-assistant/frontend#1879)) - Chore(deps): Bump marked from 18.0.3 to 18.0.5 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1880](music-assistant/frontend#1880)) - Chore(deps-dev): Bump @types/node from 25.9.1 to 25.9.2 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1881](music-assistant/frontend#1881)) - Chore(deps): Bump @tanstack/vue-form from 1.32.0 to 1.33.0 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1882](music-assistant/frontend#1882)) - Chore(deps): Bump vue-i18n from 11.4.4 to 11.4.5 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1883](music-assistant/frontend#1883)) - Chore(deps): Bump reka-ui from 2.9.8 to 2.9.9 (by @[dependabot[bot]](https://github.com/apps/dependabot) in [music-assistant#1884](music-assistant/frontend#1884)) </details> ## 🙇 Contributors @MarvinSchenkel, @OzGav and @stvncode Co-authored-by: stvncode <25082266+stvncode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In remote (WebRTC) mode the transport could enter a perpetual reconnect loop: a healthy connection — data channel
open, ICEconnected— was torn down and rebuilt roughly every 2 seconds and never stabilised. On iOS PWAs this surfaced as an endless connecting → connected → reconnecting cycle after backgrounding the app, and server-side as a new WebRTC session created and destroyed every ~2s.Root cause
Three compounding issues in
WebRTCTransport(src/plugins/remote/webrtc-transport.ts):cleanup()re-triggered recovery (the engine of the loop). Each reconnect runscleanup()thenconnect().cleanup()closed the data channel while itsonclosehandler was still attached, and that handler callsscheduleReconnect(). So the act of tearing down to reconnect scheduled another reconnect — and sinceconnect()re-creates a healthy channel, the next cleanup closed a healthy channel and re-armed the loop. Once entered it never stopped.connect()/scheduleReconnect()were not serialised. Multiple recovery drivers (dataChannel.onclose, signalingpeer-disconnected, ICE/connectionfailed) — all of which fire together on a real network drop or iOS resume — ran concurrently and corrupted the shared peer connection / SDP state (InvalidModificationError: setLocalDescription ... SDP does not match). The previous guard only blocked whileRECONNECTINGwith a pending timer.reconnectAttemptsreset to0on every successful connect; because each storm cycle briefly succeeds, the delay stayed flat (~1.5s) instead of backing off.Changes
cleanup()(data channel + peer connection) so teardown no longer re-enters the recovery path.connect()via an in-flight guard, and bail fromscheduleReconnect()when a reconnect is already pending or a connect is in flight.failedstate as a failure; let the transientdisconnectedstate self-heal (it usually recovers on its own; the browser escalatesdisconnected → failedif it does not). This aligns with WebRTC guidance.Verification
Reproduced deterministically against a local debug server (frontend dev build connected over real WebRTC/signaling), correlating client console + server logs:
connect()InvalidModificationError, teardownreconnecting25s+ after triggers stopconnected, no pending timer, attempts resetvue-tsc --noEmit, ESLint, and the vitest suite (176 tests) all pass.Follow-up (not in this PR)
In-place ICE restart recovery (
restartIce()). The WebRTC-recommended way to recover from a connectivity drop is an in-place ICE restart, which preserves DTLS keys and data channels instead of tearing down and rebuilding the wholeRTCPeerConnection. This PR makes the existing teardown-and-rebuild path correct and bounded; it does not adoptrestartIce().Adopting it requires the server to support in-place renegotiation on an existing peer connection (today the gateway creates a fresh
RTCPeerConnectionper signaling session), so it is best done alongside the planned switch to a server-side WebRTC library with trickle-ICE support. At that point the natural companion change is to add a short grace timer on ICEdisconnectedthat triggersrestartIce()(cheap recovery) rather than waiting forfailed. The serialised-negotiation guard added here is the foundation that work builds on.