Skip to content

Fix pelican-server binary panic and Upgrade xrdhttp-pelican to v0.0.7#2476

Merged
h2zh merged 2 commits into
PelicanPlatform:mainfrom
h2zh:xrdhttp-pelican-0.0.7
Jul 10, 2025
Merged

Fix pelican-server binary panic and Upgrade xrdhttp-pelican to v0.0.7#2476
h2zh merged 2 commits into
PelicanPlatform:mainfrom
h2zh:xrdhttp-pelican-0.0.7

Conversation

@h2zh

@h2zh h2zh commented Jul 8, 2025

Copy link
Copy Markdown
Contributor
  1. Implement raw syscalls in uid and gid setting for drop privs mode to bypass the AllThreadsSyscall mechanism, which is used by syscall.Setgid() and syscall.Setuid(), causing the pelican-server cache binary panic because CGO is disabled.
  2. This new xrdhttp-pelican release should enable Pelican's 'Drop Privileges' mode. The Authfile and SciTokens config file are now moved into the directories owned by the xrootd user.

How to test

Checkout to this branch, then rebuild the container.
Run rpm -q xrdhttp-pelican in the new container, you should see xrdhttp-pelican-0.0.7-........

If you want to further check, spin up the federation with the following configs for Cache only. Make sure there is no obvious error like Unable to find /run/pelican/xrootd/cache/authfile-cache-generated; no such file or directory in the running logs.

Server:
  DropPrivileges: true
  UnprivilegedUser: pelican

@h2zh h2zh linked an issue Jul 8, 2025 that may be closed by this pull request
7 tasks
@h2zh h2zh added bug Something isn't working cache Issue relating to the cache component labels Jul 8, 2025
@h2zh h2zh added this to the v7.18 milestone Jul 8, 2025
@h2zh h2zh added the create-patch Patch this into multiple versions of Pelican label Jul 8, 2025
@h2zh h2zh requested a review from patrickbrophy July 8, 2025 21:21

@patrickbrophy patrickbrophy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran both tests. The first one passed and as for the second I saw that xrootd launched as the xrootd user. I ran into this error but it may just be a bad config on my part:

cache director-based health test clean up routine failed to add directory /run/pelican/cache/namespace/pelican/monitoring to watch: no such file or directory
DEBUG[2025-07-09T15:18:24Z] Populating server's issuer URL as 'https://16ce70e416a9:7002' from configured value of 'Server.ExternalWebUrl'
DEBUG[2025-07-09T15:18:24Z] Signing token with key id: 9PJvIox6FfHzhUYkVkD4yjXJsKrEuPFcXE6USAUgXmQ
ERROR[2025-07-09T15:18:24Z] Failed to get a federation token: Failed to verify advertise token
ERROR[2025-07-09T15:18:24Z] Failed to calculate lifetime of federation token: failed to parse token: EOF.

@h2zh

h2zh commented Jul 9, 2025

Copy link
Copy Markdown
Contributor Author

@patrickbrophy Could you try to comment out the new config and start the Cache for a while (~5min)

Server:
  # DropPrivileges: true
  # UnprivilegedUser: pelican

And then kill the process, uncomment the above configs, and restart the process? At this moment, are those error logs still exist?

@patrickbrophy

Copy link
Copy Markdown
Contributor

I just attempted this and the cache was running when the config was commented out. I then killed the cache and reapplied the drop privs configuration. Then the cache crashed with this error:

DEBUG[2025-07-09T16:38:03Z] Cache is registered                                            [0/1929]
INFO[2025-07-09T16:38:03Z] Dropping privileges to user pelican
panic: doAllThreadsSyscall not supported with cgo enabled

goroutine 1 [running]:
syscall.runtime_doAllThreadsSyscall(0x40014075f8?, 0x11d53ac?, 0x363d4a0?, 0x4?, 0x2abd?, 0x2abd?,
0x0?)
        /usr/local/go/src/runtime/os_linux.go:710 +0x364
syscall.AllThreadsSyscall(0x4000ba8af0?, 0x7?, 0x40014075f8?, 0x11d53ec?)
        /usr/local/go/src/syscall/syscall_linux.go:1121 +0x4c
syscall.Setgid(0x363d460?)
        /usr/local/go/src/syscall/syscall_linux.go:1174 +0xec
github.com/pelicanplatform/pelican/launchers.dropPrivileges()
        /pelican-build/launchers/droppriv_unix.go:48 +0x108
github.com/pelicanplatform/pelican/launchers.LaunchModules({0x275be50, 0x4000a5dbf0}, 0x1)
        /pelican-build/launchers/launcher.go:350 +0x1a74
main.serveCache(0x4000eb6900?, {0x17f749a?, 0x4?, 0x17f7456?})
        /pelican-build/cmd/cache_serve.go:31 +0x2c
github.com/spf13/cobra.(*Command).execute(0x3642c40, {0x40004400c0, 0x2, 0x2})
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648
github.com/spf13/cobra.(*Command).ExecuteC(0x3649f40)
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320
github.com/spf13/cobra.(*Command).Execute(...)
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.Execute()
        /pelican-build/cmd/root.go:91 +0xd4
main.handleCLI({0x400014a000, 0x5, 0x5})
        /pelican-build/cmd/main.go:69 +0x220
main.main()
        /pelican-build/cmd/main.go:38 +0x54

@h2zh

h2zh commented Jul 9, 2025

Copy link
Copy Markdown
Contributor Author

Could you try to run goreleaser --clean --snapshot to rebuild the binary on this branch? The error logs show a cgo problem but Pelican is built without cgo by default (CGO_ENABLED=0 in line 34 and 73, .goreleaser.yml) @patrickbrophy
I just tested this in a fresh container again but was not able to reproduce all these errors... though I do find a new error Failed to write the federation token: failed to create temporary token file: open /etc/pelican/.fedtoken.1752097480292003180.1495356951: permission denied

- bypass the AllThreadsSyscall mechanism, which is used by syscall.Setgid() and syscall.Setuid(), causing the panic when CGO is disabled
- pelican-server cache binary now works perfectly with CGO disabled, no more panics
@h2zh h2zh changed the title Upgrade xrdhttp-pelican to v0.0.7 Fix pelican-server binary panic and Upgrade xrdhttp-pelican to v0.0.7 Jul 10, 2025
@patrickbrophy patrickbrophy self-requested a review July 10, 2025 20:22

@patrickbrophy patrickbrophy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After lots of testing, things are looking good!

LGTM

@h2zh h2zh merged commit 3ac2a61 into PelicanPlatform:main Jul 10, 2025
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cache Issue relating to the cache component create-patch Patch this into multiple versions of Pelican

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Authfile not found in Cache's drop privs mode

2 participants