Fix broken log setting#3062
Conversation
These are all small pet peeves I noticed while I was poking around other things related to logging.
While I was looking into the other logging bugs I fix in the encompassing PR, I noticed that drift in our default log levels had rendered a few of these comments out of date. It also looks like I forgot to give a Lotman logging directive a default.
…base
The PR that introduced these config functions didn't actually go to
clean up _all_ the code that was getting/setting log levels the old way.
In fact, that PR continued to add additional `log.GetLevel()` calls even
after introducing the helper functions, and the places where this was
done didn't work as expected (e.g. the log web UI API). When I fixed the
log API so it didn't always return "debug", I was alarmed to see the
tests continued passing, but I decided to stop pulling on the thread
there or this is bound to spiral.
I went through and examined case by case to see whether each invocation
of `log.{Get,set}Level` should be modified, and in most cases the answer
was yes.
|
I started adding a bit of the evidence from my testing to demonstrate that this diff fixed the original bug pointed out by @williamnswanson (sub logging levels like |
turetske
left a comment
There was a problem hiding this comment.
One comment that doesn't seem to track, other than that. Seems reasonable. But I also want to wait to see what the GitHub actions do as well as test locally.
Previously, `GetEffectiveLogLevel()` was trying to deduce the log level in a really convoluted way by getting the last level in the hook that wasn't found in the overall set of valid levels. However, when you provide `trace` as the configured log level, _all_ values are found in the hook, triggering the fallthrough to `log.GetLevel()`, which is always "debug". Since the log level enum starts at 0 and is just panic (0), fatal (1), error (2), warning (3), info (4), debug (5), trace (6) it's much easier to find the configured logging level by... taking a max
…gging' These two functions called 'log.SetLevel(log.DebugLevel)' for the underlying logrus object because we want to capture _all_ logged messages and then filter based on the effective log level. However, this neglected the fact that we do in fact use trace logging in Pelican outside of XRootD! As these two functions were originally implemented, they effectively disabled all trace logging. The fix is to enable trace logging under the hood and then rely on filtering to choose which get displayed.
|
As it turns out, #2897 also completely disabled "trace"-level logging in the Pelican process. There were actually two additional, related bugs here:
Combined, you could neither tell when something was set to trace, and you couldn't produce trace logs when they appeared in the code. Addressed in the latest commits. |
turetske
left a comment
There was a problem hiding this comment.
One minor comment that seems like it was missed. Otherwise I tested and it seems to have fixed the problem. I'm not approving because I'm waiting for GitHub actions to be working again.
|
As a heads up, it looks like I think it's probably bad practice to be building the binary as part of these CI tests because of the overhead, and because we don't have to do that when we come through the |
PR #2897 introduced a few new helper functions that should be used in most cases when getting/setting log levels. These new helpers were needed because the approach in that PR was to crank the internal log level to
debugregardless of the configured value, and to then filter which messages were actually printed via logging hooks. The key is thatlog.GetLevel()is now permanently locked toDebug, so to get the log level that actually corresponds to what users will find, you now need to useconfig.GetEffectiveLogLevel(). Similar forconfig.SetLogging().However, while that PR introduced this requirement it actually follow it in the new code it wrote, let alone go through the repo and clean up all the old
log.{Get,Set}Level()calls, leading to a variety of bugs.I've gone through every instance of
.{Get,Set}level()from the logrus library to see whether they should be updated with the helper functions, and in most cases the answer was "yes."However, I realized I could keep pulling and pulling on the thread, so here are a few things I noticed but did not decide to clean up:
log.GetLevel()instead of the new helpers. I fixed the API but expected to see a test fail because the API now actually returns the correct logging level. However, no tests failed and that indicates this either isn't currently tested or isn't tested correctly.log.SetLevel()seemed to contradict what the code was actually doing, e.g. this comment says we shouldn't calllog.SetLevel()directly despite the fact its called three lines later.pelican server set-logging-leveldoes not follow the logging hierarchy inheritance protocol discussed in the docs, e.g. here. In some ways this might be an accidental feature because it means that settingLogging.Levelto something like debug doesn't cause xrootd to restart with the updated logging levels.While working on this, I also noticed that in the last ~month, some unit test started littering my local repo's
xrootd/directory with files likeauthfile-cache-generatedandcopied-tls-creds.crt.tmpabout 50% of the time, but only when I run all the unit tests withgo ./.... I spent a solid 45 minutes trying to track it down because this really bugs me, but I couldn't find the source and wasn't willing to start git bisecting.