Skip to content

Fix broken log setting#3062

Merged
turetske merged 10 commits into
PelicanPlatform:mainfrom
jhiemstrawisc:issue-3036
Feb 3, 2026
Merged

Fix broken log setting#3062
turetske merged 10 commits into
PelicanPlatform:mainfrom
jhiemstrawisc:issue-3036

Conversation

@jhiemstrawisc

Copy link
Copy Markdown
Member

PR #2897 introduced a few new helper functions that should be used in most cases when getting/setting log levels. These new helpers were needed because the approach in that PR was to crank the internal log level to debug regardless of the configured value, and to then filter which messages were actually printed via logging hooks. The key is that log.GetLevel() is now permanently locked to Debug, so to get the log level that actually corresponds to what users will find, you now need to use config.GetEffectiveLogLevel(). Similar for config.SetLogging().

However, while that PR introduced this requirement it actually follow it in the new code it wrote, let alone go through the repo and clean up all the old log.{Get,Set}Level() calls, leading to a variety of bugs.

I've gone through every instance of .{Get,Set}level() from the logrus library to see whether they should be updated with the helper functions, and in most cases the answer was "yes."

However, I realized I could keep pulling and pulling on the thread, so here are a few things I noticed but did not decide to clean up:

  • The logging API introduced in Add runtime log level management API with automatic restoration #2897 used to give the perma-locked "Debug" level no matter how your service was configured because it used log.GetLevel() instead of the new helpers. I fixed the API but expected to see a test fail because the API now actually returns the correct logging level. However, no tests failed and that indicates this either isn't currently tested or isn't tested correctly.
  • Some comments in the code around log.SetLevel() seemed to contradict what the code was actually doing, e.g. this comment says we shouldn't call log.SetLevel() directly despite the fact its called three lines later.
  • The pelican server set-logging-level does not follow the logging hierarchy inheritance protocol discussed in the docs, e.g. here. In some ways this might be an accidental feature because it means that setting Logging.Level to something like debug doesn't cause xrootd to restart with the updated logging levels.

While working on this, I also noticed that in the last ~month, some unit test started littering my local repo's xrootd/ directory with files like authfile-cache-generated and copied-tls-creds.crt.tmp about 50% of the time, but only when I run all the unit tests with go ./.... I spent a solid 45 minutes trying to track it down because this really bugs me, but I couldn't find the source and wasn't willing to start git bisecting.

These are all small pet peeves I noticed while I was poking around
other things related to logging.
While I was looking into the other logging bugs I fix in the
encompassing PR, I noticed that drift in our default log levels
had rendered a few of these comments out of date. It also looks like
I forgot to give a Lotman logging directive a default.
…base

The PR that introduced these config functions didn't actually go to
clean up _all_ the code that was getting/setting log levels the old way.
In fact, that PR continued to add additional `log.GetLevel()` calls even
after introducing the helper functions, and the places where this was
done didn't work as expected (e.g. the log web UI API). When I fixed the
log API so it didn't always return "debug", I was alarmed to see the
tests continued passing, but I decided to stop pulling on the thread
there or this is bound to spiral.

I went through and examined case by case to see whether each invocation
of `log.{Get,set}Level` should be modified, and in most cases the answer
was yes.
@jhiemstrawisc jhiemstrawisc added this to the v7.23 milestone Feb 2, 2026
@jhiemstrawisc jhiemstrawisc added bug Something isn't working configuration labels Feb 2, 2026
@jhiemstrawisc

Copy link
Copy Markdown
Member Author

I started adding a bit of the evidence from my testing to demonstrate that this diff fixed the original bug pointed out by @williamnswanson (sub logging levels like Logging.Origin.Xrd were always locked on "debug"), and in the process found another bug that prevents setting any of the sub logging levels to trace. I'll keep working on it.

@turetske turetske left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment that doesn't seem to track, other than that. Seems reasonable. But I also want to wait to see what the GitHub actions do as well as test locally.

Comment thread config/config.go Outdated
Previously, `GetEffectiveLogLevel()` was trying to deduce the log level
in a really convoluted way by getting the last level in the hook that
wasn't found in the overall set of valid levels. However, when you
provide `trace` as the configured log level, _all_ values are found in
the hook, triggering the fallthrough to `log.GetLevel()`, which is
always "debug".

Since the log level enum starts at 0 and is just
panic (0), fatal (1), error (2), warning (3), info (4), debug (5), trace (6)

it's much easier to find the configured logging level by... taking a max
…gging'

These two functions called 'log.SetLevel(log.DebugLevel)' for the underlying
logrus object because we want to capture _all_ logged messages and then
filter based on the effective log level.

However, this neglected the fact that we do in fact use trace logging
in Pelican outside of XRootD! As these two functions were originally
implemented, they effectively disabled all trace logging.

The fix is to enable trace logging under the hood and then rely on filtering
to choose which get displayed.
@jhiemstrawisc

Copy link
Copy Markdown
Member Author

As it turns out, #2897 also completely disabled "trace"-level logging in the Pelican process. There were actually two additional, related bugs here:

  1. GetEffectiveLogLevel wasn't working correctly with "trace"
  2. The SetLogging() and initFilterLogging() methods set the global logrus level to "debug" internally, which prevented any trace logging.

Combined, you could neither tell when something was set to trace, and you couldn't produce trace logs when they appeared in the code. Addressed in the latest commits.

@turetske turetske left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment that seems like it was missed. Otherwise I tested and it seems to have fixed the problem. I'm not approving because I'm waiting for GitHub actions to be working again.

Comment thread config/logging.go Outdated
@jhiemstrawisc jhiemstrawisc linked an issue Feb 3, 2026 that may be closed by this pull request
@jhiemstrawisc

jhiemstrawisc commented Feb 3, 2026

Copy link
Copy Markdown
Member Author

As a heads up, it looks like e2e_fed_tests/logging_level_test.go::TestCLILoggingLevelChanges might be the latest new flaky test. In particular, it was failing whenever I ran locally because my testing environment set a timeout of 30s, but the test actually builds the Pelican binary from scratch so it was taking closer to a minute to run. I think it's failing in CI sometimes for a similar reason.

I think it's probably bad practice to be building the binary as part of these CI tests because of the overhead, and because we don't have to do that when we come through the cmd package. In cmd/ we run the commands directly using cobra. Maybe having the ability to access these commands in e2e_fed_tests is a good enough reason to export the cobra primitives?

@turetske turetske left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@turetske turetske merged commit 83440e9 into PelicanPlatform:main Feb 3, 2026
30 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working configuration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logging is not inheriting the values from Logging.Level

2 participants