Skip to content

Allow connection broker to be used for director->cache communications#2378

Merged
bbockelm merged 3 commits into
PelicanPlatform:mainfrom
bbockelm:universal_broker
Jun 19, 2025
Merged

Allow connection broker to be used for director->cache communications#2378
bbockelm merged 3 commits into
PelicanPlatform:mainfrom
bbockelm:universal_broker

Conversation

@bbockelm

@bbockelm bbockelm commented Jun 9, 2025

Copy link
Copy Markdown
Collaborator

This hooks the use of the broker into the director, allowing it to contact the cache.

With this, caches no longer need to allow incoming connections from the director to enable functionality like monitoring.

bbockelm added 2 commits May 31, 2025 09:47
When the director wants to connect to services leveraging the broker,
have the global transport object use a new broker-aware dialer function.
Comment thread config/transport.go Dismissed
@bbockelm bbockelm added enhancement New feature or request cache Issue relating to the cache component director Issue relating to the director component broker labels Jun 9, 2025
@bbockelm bbockelm linked an issue Jun 9, 2025 that may be closed by this pull request
@bbockelm bbockelm requested a review from h2zh June 10, 2025 13:54
@h2zh h2zh self-assigned this Jun 10, 2025

@h2zh h2zh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review focus is the new dialer. It seems that the key of brokerEndpoints is designed to store service name but actually uses url.

When you address these comments, could you create a new commit for the changes instead of rebasing the changes to existing commits? In this way, I’m able to view what has been changed.

Comment thread config/transport.go Outdated
Comment thread director/cache_ads.go Outdated
Comment thread broker/dialer.go Outdated
Comment thread director/cache_ads.go
Comment on lines +184 to +187
brokerDialer.UseBroker(sType, sAd.WebURL.Host, sAd.BrokerURL.String())
if sAd.Type == server_structs.OriginType.String() {
brokerDialer.UseBroker(sType, sAd.URL.Host, sAd.BrokerURL.String())
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use service name as the key for brokerEndpoints (as what it is in this PR), the code here should be:

Suggested change
brokerDialer.UseBroker(sType, sAd.WebURL.Host, sAd.BrokerURL.String())
if sAd.Type == server_structs.OriginType.String() {
brokerDialer.UseBroker(sType, sAd.URL.Host, sAd.BrokerURL.String())
}
brokerDialer.UseBroker(sType, sAd.Name, sAd.BrokerURL.String())

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll tweak the comments and inline documentation but the original form is correct.

The second argument is the address - as passed to the DialContext function. This is typically the host:port used in a URL to contact the server (for the origin, this is the xrootd URL and web URL; for the cache, only the web URL).

Long-term, this is indeed something I'd like to clean-up: what sort of "hostname" should be provided in URLs that are meant to be used in the broker? It's not clear the self-identified hostname is the right thing.

For now, I think we shouldn't try to clean this piece up in this PR.

Comment thread broker/dialer.go Outdated
Comment on lines +89 to +104
func (d *BrokerDialer) DialContext(ctx context.Context, network, addr string) (net.Conn, error) {
info := d.brokerEndpoints.Get(addr)
if info == nil {
// If the endpoint is not found in the cache, use the default dialer.
return d.dialerContext(ctx, network, addr)
}

sType := info.Value().ServerType
prefix := ""
if sType.IsEnabled(server_structs.CacheType) {
addrSplit := strings.SplitN(addr, ":", 2)
prefix = "/caches/" + addrSplit[0]
} else {
prefix = "/origins/" + addr
}
return ConnectToOrigin(ctx, info.Value().BrokerUrl, prefix, addr)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pelican service name, prefix and URL/WebURL have the following pattern (no exception in Director data):

Origin: prefix = “/origins/” + webUrl
Cache: prefix = “/caches/” + name

addr in this function seems to be webUrl, so the prefix this snippet produces for all cache servers are incorrect.

However, there is no any fixed pattern between Pelican service name and webURL. It means we can't convert addr(webURL) to name then to prefix. According to the function signature of ConnectToOrigin, its last input param is the name of origin/cache, which is missing in this code snippet. Also, the brokerEndpoints requires name as the key.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the unit tests, this seems to match the current default behaviors, no? Are you saying this isn't matching what's currently in the OSDF?

Unfortunately, if the unit tests don't match OSDF's behavior, we might have to accept this as scaffolding to a larger set of fixes for the broker URL & naming in general as a follow-up in v7.18 as discussed today.

Comment thread broker/dialer.go
Tries to make the functionality clearer (via variable and method names)
and adds a few comments in code the reviewer found confusing.
@bbockelm

Copy link
Copy Markdown
Collaborator Author

Since I've got a few pending PRs that depend on this one -- and the remaining issues are around the registry naming (which we know needs cleaned up) -- I'm going to pull PI privilege and go ahead and merge this. Will create a follow-up issue for the naming piece.

@bbockelm bbockelm merged commit 968b70a into PelicanPlatform:main Jun 19, 2025
88 of 101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

broker cache Issue relating to the cache component director Issue relating to the director component enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix connection broker mode for director communication

3 participants