Skip to content

Commit f3d4c64

Browse files
committed
log: bypass per-thread pool once shutdown has freed it
`fr_atexit_thread_trigger_all()` runs each registered thread destructor on the calling (main) thread, including `_fr_log_pool_free` for every worker that ever logged. That frees the underlying pool chunk but can't reset the `_Thread_local fr_log_pool` slot in any thread other than main, so threads spawned outside our schedule (librdkafka's bg threads, perl, etc.) keep a dangling pointer. The next log call from those threads (typically the "Terminating instance" debug line that librdkafka emits inside `rd_kafka_destroy()` during `mod_detach`) hands the dead pointer to `talloc_new` and aborts with "Bad talloc magic value". Add `fr_log_disable_pools()` and call it from radiusd right after the trigger. `fr_log_pool_init()` short-circuits to NULL once set, so the TLS read is skipped and downstream `talloc_new(NULL)` allocates a top-level chunk for the duration of the line. Relaxed atomic because the flag is a single-writer signal with no other state to synchronise through it.
1 parent 11b8427 commit f3d4c64

3 files changed

Lines changed: 54 additions & 1 deletion

File tree

src/bin/radiusd.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1122,6 +1122,18 @@ do { \
11221122
*/
11231123
(void) fr_schedule_destroy(&sc);
11241124

1125+
/*
1126+
* Threads not managed by the schedule (e.g. librdkafka's
1127+
* bg threads) hold a `_Thread_local fr_log_pool` we can't
1128+
* clear from this thread. `fr_atexit_thread_trigger_all`
1129+
* below frees the underlying chunks, leaving those slots
1130+
* dangling. Flip the logger into pool-less mode first so
1131+
* any log line that races the trigger - or fires after it -
1132+
* goes through `talloc_new(NULL)` instead of dereferencing
1133+
* a freed pool.
1134+
*/
1135+
fr_log_disable_pools();
1136+
11251137
/*
11261138
* Ensure all thread local memory is cleaned up
11271139
* before we start cleaning up global resources.

src/lib/util/log.c

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ RCSID("$Id$")
2929
#include <freeradius-devel/util/value.h>
3030

3131
#include <fcntl.h>
32+
#include <stdatomic.h>
3233
#ifdef HAVE_FEATURES_H
3334
# include <features.h>
3435
#endif
@@ -41,6 +42,23 @@ int fr_debug_lvl = 0;
4142

4243
static _Thread_local TALLOC_CTX *fr_log_pool;
4344

45+
/** Latched once shutdown has freed every thread's log pool
46+
*
47+
* `fr_atexit_thread_trigger_all()` runs every registered thread destructor
48+
* on the calling (main) thread, so it frees the log pool memory for threads
49+
* whose TLS slot it can't reach (librdkafka's bg threads, anything spawned
50+
* by a third-party library that bypasses our schedule). Those threads
51+
* still hold the now-dangling pointer in their `_Thread_local fr_log_pool`,
52+
* and will hand it to `talloc_new` on the next log call - "Bad talloc magic
53+
* value" abort.
54+
*
55+
* Once set, `fr_log_pool_init()` ignores the TLS slot entirely and returns
56+
* NULL; downstream `talloc_new(NULL)` / `talloc_asprintf(NULL, ...)` calls
57+
* just allocate top-level chunks for the duration of the log line. No
58+
* pooling, no TLS, safe from any thread.
59+
*/
60+
static atomic_bool log_pools_disabled;
61+
4462
static uint32_t location_indent = 30;
4563
static fr_event_list_t *log_el; //!< Event loop we use for process logging data.
4664

@@ -303,17 +321,38 @@ static int _fr_log_pool_free(void *arg)
303321
return 0;
304322
}
305323

324+
/** Disable per-thread log pools for the rest of the process lifetime
325+
*
326+
* Call this from the main thread immediately after
327+
* `fr_atexit_thread_trigger_all()`, which frees every other thread's log
328+
* pool but can't reset their `_Thread_local` slot. After this returns,
329+
* subsequent `fr_log` calls fall back to `talloc_new(NULL)` instead of
330+
* touching the (now dangling) TLS pool pointer.
331+
*/
332+
void fr_log_disable_pools(void)
333+
{
334+
atomic_store_explicit(&log_pools_disabled, true, memory_order_relaxed);
335+
}
336+
306337
/** talloc ctx to use when composing log messages
307338
*
308339
* Functions must ensure that they allocate a new ctx from the one returned
309340
* here, and that this ctx is freed before the function returns.
310341
*
311-
* @return talloc pool to use for scratch space.
342+
* @return talloc pool to use for scratch space, or NULL if pools have been
343+
* disabled - callers must tolerate a NULL return.
312344
*/
313345
TALLOC_CTX *fr_log_pool_init(void)
314346
{
315347
TALLOC_CTX *pool;
316348

349+
/*
350+
* Post-trigger of every thread atexit handler the TLS slot
351+
* may be a dangling pointer for any thread we don't own
352+
* (librdkafka's bg threads etc.) - skip the pool entirely.
353+
*/
354+
if (unlikely(atomic_load_explicit(&log_pools_disabled, memory_order_relaxed))) return NULL;
355+
317356
pool = fr_log_pool;
318357
if (unlikely(!pool)) {
319358
if (fr_atexit_is_exiting()) return NULL; /* No new pools if we're exiting */

src/lib/util/log.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,8 @@ int fr_log_close(fr_log_t *log) CC_HINT(nonnull);
196196

197197
TALLOC_CTX *fr_log_pool_init(void);
198198

199+
void fr_log_disable_pools(void);
200+
199201
int fr_log_global_init(fr_event_list_t *el, bool daemonize) CC_HINT(nonnull);
200202

201203
void fr_log_global_free(void);

0 commit comments

Comments
 (0)