@apollo/gateway: Make ApolloGateway.stop() reliable and required#452
Merged
Conversation
Member
Author
|
Before merging this I should implement the Apollo Server change described in it and release v2.20. It's ready for review but I'll mark as draft for this reason. |
trevor-scheer
approved these changes
Feb 4, 2021
Contributor
There was a problem hiding this comment.
Should I be able to call load again after awaiting the stop?
Contributor
There was a problem hiding this comment.
Probably disregard, I can tell below it's explicitly written with the intention of not allowing that.
Member
Author
There was a problem hiding this comment.
FWIW I actually don't know offhand if you get unbounded stack sizes from recursion in async functions.
glasser
added a commit
to apollographql/apollo-server
that referenced
this pull request
Feb 8, 2021
Because AS is what invokes ApolloGateway.load, it should be its responsibility to invoke the matching stop method. apollographql/federation#452 (aimed at @apollo/gateway 0.23) will change ApolloGateway to no longer "unref" its polling timer: ie, it makes calling `stop()` actually part of its expected API instead of something you can ignore if you feel like it without affecting program shutdown. It has a bit of a hack to still unref the timer if it looks like you're using an old (pre-2.18) version of Apollo Server, but this PR (which will be released in v2.20.0) will make ApolloServer stop the gateway for you. Part of fixing #4428.
glasser
added a commit
to apollographql/apollo-server
that referenced
this pull request
Feb 8, 2021
Because AS is what invokes ApolloGateway.load, it should be its responsibility to invoke the matching stop method. apollographql/federation#452 (aimed at @apollo/gateway 0.23) will change ApolloGateway to no longer "unref" its polling timer: ie, it makes calling `stop()` actually part of its expected API instead of something you can ignore if you feel like it without affecting program shutdown. It has a bit of a hack to still unref the timer if it looks like you're using an old (pre-2.18) version of Apollo Server, but this PR (which will be released in v2.20.0) will make ApolloServer stop the gateway for you. Part of fixing #4428. Also simplify typings of toDispose set. `() => ValueOrPromise<void>` is a weird type because `void` means "I'm not going to look at the return value" which is sorta incompatible with "but I need to see if it's a Promise or not". So just make these all async (changing a couple implementations by adding `async`).
3 tasks
The (not particularly documented) ApolloGateway.stop() method didn't reliably stop the gateway from polling. All it did was cancel a timeout. But if it is called while pollServices is in the middle of running, it would do nothing, and then pollServices would carry on and set another timeout. Better semantics would be for stop() to reliably stop polling: allow the current poll to complete, ensure that there will be no more polls, and then (async) return. This PR implements those semantics, by implementing an explicit state machine for ApolloGateway's polling. One reason that these bugs were able to survive is that calling stop() is often unnecessary. In apollographql/apollo-server#3105 we chose to `unref` the polling timeout to allow the Node process to exit if it's the only thing left on the event loop, instead of encouraging users of `ApolloGateway` to be responsible and call `stop()` themselves. While this may be reasonable when the gateway's lifecycle is merely a schema polling timer, we may have future changes to the gateway where proper lifecycle handling is more important. So this PR also moves away from the world where it's fine to not bother to explicitly stop the gateway. That said, in the common case, we don't want users to have to write gateway stopping code. It makes more sense for stopping the `ApolloGateway` to be the responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can trigger events in plugins. So in the recently-released Apollo Server v2.20, `ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in most cases, the missing `unref` shouldn't keep the process running, as long as you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts of the server are keeping your process running). What if you're still running an old version of Apollo Server? For a bit of backwards compatibility, `ApolloGateway` detects if it appears to be connected to Apollo Server older than v2.18. If so, it continues to do the old unref(). If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its own for some reason, and this change causes your processes to hang on shutdown, adding a `stop()` call should be pretty straightforward. (If we learn that this change really is devastating, we can always go back to an unconditional timer.unref() later.) Fixes #4428.
c4fe737 to
d82e916
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.
Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.
One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to
unrefthe polling timeout to allow the Node process to exit if it'sthe only thing left on the event loop, instead of encouraging users of
ApolloGatewayto be responsible and callstop()themselves. While this maybe reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.
That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the
ApolloGatewayto be theresponsibility of
ApolloServer.stop(), just asApolloServer.stop()cantrigger events in plugins. So in the recently-released Apollo Server v2.20,
ApolloServer.stop()callsApolloGateway.stop(). This should mean that inmost cases, the missing
unrefshouldn't keep the process running, as long asyou've run
ApolloServer.stop()(and if you don't, it's likely that other partsof the server are keeping your process running).
What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility,
ApolloGatewaydetects if it appears to be connectedto Apollo Server older than v2.18. If so, it continues to do the old
unref(). If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a
stop()call should be pretty straightforward. (If we learn that thischange really is devastating, we can always go back to an unconditional
timer.unref() later.)
Fixes apollographql/apollo-server#4428.