Skip to content

Bugfix for downtime handling in Director#2323

Merged
h2zh merged 1 commit into
PelicanPlatform:mainfrom
h2zh:fix-downtime-registry-director
May 19, 2025
Merged

Bugfix for downtime handling in Director#2323
h2zh merged 1 commit into
PelicanPlatform:mainfrom
h2zh:fix-downtime-registry-director

Conversation

@h2zh

@h2zh h2zh commented May 19, 2025

Copy link
Copy Markdown
Contributor

In Director, expired downtimes set by Registry should also be cleared when no active downtimes set by Registry.

How to test this PR locally

Spin up Registry, Director, Origin (or Cache). In Registry, create a downtime (ends at X minutes from now) for the Origin. Go to Director to verify this Origin is in downtime. X minutes later, this downtime expires. Check this Origin in Director again to make sure it is not in downtime.

If you find these downtime logics are too complicated... here is a diagram for the Downtime management workflow in Pelican #2218

- In Director, Clear expired downtimes set by Registry when no active downtimes set by Registry
@h2zh h2zh added this to the v7.17 milestone May 19, 2025
@h2zh h2zh added bug Something isn't working director Issue relating to the director component labels May 19, 2025
@h2zh h2zh requested a review from CannonLock May 19, 2025 16:15

@CannonLock CannonLock left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As a comment I wish some of this was generalized in a function. I imagine ( without looking ) the code for the topo and server downtime updates looks similar to this.

	// Remove existing filteredSevers that are fetched from the Registry first
	for key, val := range filteredServers {
		if val == tempFiltered {
			delete(filteredServers, key)
		}
	}

	// Build a new map to replace the in-memory federationDowntimes map
	newFederationDowntimes := make(map[string][]server_structs.Downtime)
	currentTime := time.Now().UTC().UnixMilli()

	for _, downtime := range runningServersDowntimes {
		// Save all active and future downtimes to the new map
		newFederationDowntimes[downtime.ServerName] = append(newFederationDowntimes[downtime.ServerName], downtime)

		// Check existing downtime filter
		originalFilterType, hasOriginalFilter := filteredServers[downtime.ServerName]
		// If this server is already put in downtime, we don't need to do anything to the filteredServers map
		if hasOriginalFilter && originalFilterType != tempAllowed {
			continue
		}
		// Otherwise, if it is an active downtime, we need to put it into the filteredServers map
		if currentTime >= downtime.StartTime && (currentTime <= downtime.EndTime || downtime.EndTime == -1) {
			filteredServers[downtime.ServerName] = tempFiltered
		}
	}

@CannonLock

CannonLock commented May 19, 2025

Copy link
Copy Markdown
Contributor

This is the React background talking but I would also prefer that filtered servers wasn't a separate value but a memoized function that only updates when the inputs change.

Having this as its own state allows potential for it to desync from its parent.

@h2zh

h2zh commented May 19, 2025

Copy link
Copy Markdown
Contributor Author
  1. That's a good call. Though I'm afraid their differences are bigger than they look like. For example, fed and topo downtimes do a complete replacement every time the new data comes in, while server downtime only replace a specific server's downtime when a new server ad comes in. I'll keep it in mind and hope one day I come up with a way to gracefully consolidate them. 🤨

  2. Go doesn't have built-in memoization hooks like React. This memoized function will be called frequently (e.g. iterates over all servers to check if they are in downtime, etc.), which leads to a performance burden if the memoization is not implemented properly. I will keep it as it is until I find a good way to implement memoization in Go.

@h2zh h2zh merged commit d4e7025 into PelicanPlatform:main May 19, 2025
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working director Issue relating to the director component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expired downtimes set by Registry are not cleared when no active/future downtimes set by Registry

2 participants