Crash when shutting down Wifi with 'Wifi 0'#24536
Merged
s-hadinger merged 1 commit intoarendst:developmentfrom Mar 9, 2026
Merged
Crash when shutting down Wifi with 'Wifi 0'#24536s-hadinger merged 1 commit intoarendst:developmentfrom
s-hadinger merged 1 commit intoarendst:developmentfrom
Conversation
josef109
pushed a commit
to josef109/Tasmota
that referenced
this pull request
Mar 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
Fix a long standing crash when Wifi gets disconnected with command
Wifi 0. The fix ensures that Webserver and UDPserver are closed and reopened each time Wifi gets disconnectedHere is the full analysis by Claude:
Wifi Shutdown Crash Analysis —
Wifi 0CommandThe Bug
Executing
Wifi 0(CmndWifi()) while there is active Web or UDP traffic causes a crash. The crash does not occur when there is no traffic at the time of shutdown.Crash Traces (call chain, caller → callee)
TCP (webserver):
UDP (Berry/Matter):
Architecture Context
Tasmota is single-threaded and synchronous. The main loop calls
Every250mSeconds()which runs a state machine, and also callsFUNC_LOOPwhich runs driver loops includingPollDnsWebserver()→Webserver->handleClient().The webserver uses a listening TCP socket managed by lwIP. Incoming TCP connections land in the socket's accept backlog — a queue of pending connections with associated
pbufchains that reference the network interface (netif) they arrived on.UDP sockets similarly queue received packets in lwIP's internal receive buffer, with
pbufchains referencing thenetifthe packet arrived on.Chain of Events Leading to Crash
The main loop is
Scheduler()intasmota.ino. Each iteration runs, in order:XdrvXsnsCall(FUNC_LOOP)— calls all driver loops, includingXdrv01()→PollDnsWebserver()→Webserver->handleClient()Every250mSeconds()— state machine, runs every 250msStep 1:
CmndWifi(0)is called (insupport_command.ino)Settings->flag4.network_wifi = 0(that's all it does for payload 0)Step 2:
Every250mSeconds()state machine,case 2(insupport_tasmota.ino)x.5second markSettings->flag4.network_wifi: if 0, callsWifiDisable()WifiDisable()(insupport_wifi.ino) →WifiShutdown()→WiFi.disconnect(true, true)+WifiSetMode(WIFI_OFF)netifin lwIP, freeing its memory structurespbufpointers to the now-destroyed WiFinetifStep 3: Next
Scheduler()iteration,FUNC_LOOPrunsXdrv01(FUNC_LOOP)→PollDnsWebserver()→Webserver->handleClient()→lwip_accept()→pbuf_free()on dangling pointer → crashXdrv52(FUNC_LOOP)→ Berryudp.read()→lwip_recvfrom()→netbuf_delete→pbuf_free()on dangling pointer → crashNote: the crash can also happen later (e.g. when WiFi is re-enabled) if the socket read is somehow skipped — the poison persists in the socket until the socket is closed.
Why the Poison Persists
The dangling references live inside lwIP-internal queues (TCP accept backlog, UDP receive buffer). The only way to clear them is to close the socket (
lwip_close), which properly frees all queued data.Critically,
WifiShutdown()contains multipledelay()calls (totaling ~300ms) that yield to the RTOS scheduler. During these yields, the lwIP TCP/IP task runs and can queue new incoming data into sockets. This means even if you flush a socket right beforeWifiShutdown()and immediately reopen it, the new socket gets re-poisoned during the delays beforeWiFi.disconnect()destroys the netif.Approaches That Don't Work
Guarding
PollDnsWebserver()to skiphandleClient(): Only postpones the crash. When WiFi is re-enabled, the listening socket still has the poisoned backlog. The nextlwip_accept()crashes.Selectively draining WiFi connections from the backlog: The crash happens inside
lwip_accept()before the connection is returned to user code. There is no opportunity to inspect or filter.Timing flags (
disable_in_progress): The crash happens afterWifiDisable()returns, in the same main loop iteration. No flag-based deferral helps.close()+begin()beforeWifiShutdown(): The fresh socket gets re-poisoned during thedelay()calls insideWifiShutdown(), which yield to the RTOS and allow the lwIP task to queue new WiFi connections beforeWiFi.disconnect()destroys the netif.The Fix
Close all sockets before WiFi teardown, reopen them after.
TCP Webserver (C level)
WebserverStopSocket()/WebserverStartSocket()inxdrv_01_9_webserver.ino:UDP and other sockets (driver notification)
WifiDisable()dispatchesFUNC_NETWORK_DOWNbefore teardown andFUNC_NETWORK_UPafter (if Ethernet is still up). Berry's xdrv_52 forwards these as driver events"network_down"/"network_up".WifiDisable()(insupport_wifi.ino)Berry/Matter UDP (Berry level)
Berry events are dispatched via
tasmota.event()to Tasmota drivers (objects registered withtasmota.add_driver()). They are NOT dispatched to rules (tasmota.add_rule()). The method name on the driver must match the event name.Matter_Device(inMatter_zz_Device.be) is already a Tasmota driver. Added methods:Matter_UDPServer(inMatter_UDPServer.be) — new methods:The
loop()method already guards againstudp_socket == nil(if self.udp_socket == nil return end). A nil guard was also added tosend()to prevent crashes during the brief window when the socket is closed.Why
EspRestart()/ deep sleep don't need the fixOther callers of
WifiShutdown()(EspRestart,DeepSleepStart, ULP sleep) don't need the flush because the device is shutting down — the dangling pointer never gets dereferenced.Gotchas Discovered
Berry
udpclass usesclose(), notstop(): The native method is mapped ascloseinbe_udp_lib.c. Using.stop()raisesattribute_error. The existingMatter_UDPServer.stop()method had the same latent bug (fixed).Berry event dispatch goes to drivers, not rules:
callBerryEventDispatcher()callstasmota.event()which dispatches to driver methods viaintrospect.get(driver, event_name). It does NOT triggertasmota.add_rule()callbacks. To receive these events, a class must be registered as a driver (tasmota.add_driver(self)) and have a method matching the event name.FUNC_NETWORK_DOWNis already called inEvery250mSecondscase 3: But only whennetwork_downis true (both WiFi AND Ethernet down), and 250ms afterWifiDisable()— too late. Our new call inWifiDisable()fires before teardown regardless of Ethernet state.Key Files
tasmota/tasmota_support/support_wifi.inoWifiDisable(),WifiShutdown(),EspRestart()tasmota/tasmota_xdrv_driver/xdrv_01_9_webserver.inoWebserverStopSocket(),WebserverStartSocket(),PollDnsWebserver()tasmota/tasmota_xdrv_driver/xdrv_52_9_berry.inoFUNC_NETWORK_DOWN/FUNC_NETWORK_UPlib/libesp32/berry_matter/src/embedded/Matter_zz_Device.benetwork_down()/network_up()driver methodslib/libesp32/berry_matter/src/embedded/Matter_UDPServer.beflush_socket()/reopen_socket()/send()nil guardlib/libesp32/berry_tasmota/src/be_udp_lib.cudpclass definition (method isclose, notstop)tasmota/tasmota_support/support_tasmota.inoEvery250mSeconds()state machine (case 2 triggersWifiDisable)tasmota/tasmota.inoWIFIstruct definitionNetworkServersockfdis private, no accessor — cannot patch externallyWebServerhandleClient()calls_server.accept()→lwip_accept()NetworkUDPparsePacket()callsrecvfrom()→lwip_recvfrom()Checklist:
NOTE: The code change must pass CI tests. Your PR cannot be merged unless tests pass