Retain age of cache and result data#1498
Conversation
Motivation: Any time a developer wants to cache data they inevitably also want a way to invalidate the cache or know when to ignore its data. In order to support those use cases, the date that the data was last verified as received needs to be stored. Modifications: - Add: `RecordRow` struct that becomes the underlying storage base type for `RecordSet` that contains the `Record` and `lastReceivedAt` date - Add: Mechanism to `SQLIteNormalizedCache` to handle schema migrations Result: TTL and cache age will be soon possible to query based on the data's `lastReceivedAt` date.
Motivation: `RecordRow.lastReceivedAt` is an important property that developers want access to in order to inspect the age of a given result. In order to provide this, it needs to be returned in GraphQL results in some fashion. Modifications: - Add: `GraphQLResultContext` object for holding result metadata, such as the `resultAge` - Add: `context` property to `GraphQLResult` that allows access to the result metadata - Change: `NormalizedCache.loadRecords` completion to receive `RecordRow?` instead of `Record?` - Change: `RecordSet` to use `RecordRow` rather than `Record` - Change: `SQLiteNormalizedCache.init(fileURL:shouldVacuumOnClear:)` to be a convenience initializer - Change: `SQLIteNormalizedCache` initializers to accept an optional `RecordSet` Result: Developers can now access a `context` property on `GraphQLResult` to inspect metadata behind a given result.
|
@Mordil: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/ |
|
@designatednerd I think I addressed all of the feedback you had left on #971 |
|
@gsabran If there's a different email you want associated with the commit author attribution - let me know and I'll update the commits |
|
I'm going to ask @martijnwalraven to also take a look at this when he's back on Tuesday - he's been neck-deep in cache stuff so I think he will have more opinions than I do. |
designatednerd
left a comment
There was a problem hiding this comment.
My comments are mostly about formatting and dates. I have gone down way, way, way too many date rabbit holes.
I definitely think @martijnwalraven will have more concrete feedback on the actual functionality here.
| func zip<Accumulator1: GraphQLResultAccumulator, Accumulator2: GraphQLResultAccumulator, Accumulator3: GraphQLResultAccumulator, Accumulator4: GraphQLResultAccumulator>(_ accumulator1: Accumulator1, _ accumulator2: Accumulator2, _ accumulator3: Accumulator3, _ accumulator4: Accumulator4) -> Zip4Accumulator<Accumulator1, Accumulator2, Accumulator3, Accumulator4> { | ||
| return Zip4Accumulator(accumulator1, accumulator2, accumulator3, accumulator4) | ||
| } | ||
|
|
There was a problem hiding this comment.
@martijnwalraven I feel like this is something you told me might no longer be necessary with changes to Swift's generics handling - do you have any recollection?
There was a problem hiding this comment.
I think this is still needed, if Accumulator1, Accumulator2, ... are different concrete types. There might be some changes to Swift generics in the future to remove this code, specifically "Variadic Generics", but that's still a long way to go.
There was a problem hiding this comment.
Yep, variadic generics is what I was talking about. But it indeed seems that might take a while to make it into the language, so we still need these workarounds.
gsabran
left a comment
There was a problem hiding this comment.
Wao this is exciting!! To avoid issues with the user changing their device clock, we can use something like:
/// Return a timestamp in ms from an arbitrary origin that is consistent for a given device.
func getCurrentTimestamp() -> Int {
Int(clock_gettime_nsec_np(CLOCK_MONOTONIC) / 1000000)
}
designatednerd
left a comment
There was a problem hiding this comment.
Awesome, only a couple comments - will wait for Martijn to get a look Tuesday before proceeding. Thanks for the quick turnaround!
|
@gsabran What's the timestamp for? |
There was a problem hiding this comment.
Thanks for this PR! This is definitely a worthwhile feature to add. My comments are mainly about the way we want to extend the existing API in a way that sets us up for future extensions.
The TL;DR is that I'm wary of adding specific pieces of metadata like firstReceivedAt to the core of the execution pipeline, because that has repercussions everywhere (it changes the resolver return value, needs to be passed around in all execution methods, the accumulator, etc.). Instead, I think we should take this opportunity to come up with a general metadata mechanism that we can take advantage of for other use cases.
A few additional pieces of feedback that didn't have a clear place to leave them as comments:
- The term
firstReceivedAtseems a bit confusing to me. MaybelastUpdateFromServerwould be a better description? That also depends on what semantics we expect from this. I don't think we always want to update thelastReceivedAtwith the current date for every write, because there's often a need to differentiate between server updates and (optimistic) client updates. So maybe we need a separate property for that? - I'm also not sure the semantics of keeping this per record (as opposed to per field) make sense, because you often fetch subsets of the fields for an object (see comment below).
- Although I think it makes sense to give people access to
firstReceivedAtfrom a result when they have a need for custom logic, the example code listed under Motivation seems like the main use case and I think we should have explicit support for it. Maybe some kind ofmaxAgecondition on.returnCacheDataDontFetch, or a general way to pass a predicate for a conditional fetch (a la HTTP Cache Control)? It could also be something you configure (overridable) defaults for onApolloClient. Besides avoiding repetitive code everywhere a result is received, that would also avoid an unnecessary switch to the main queue and back in case the data is stale and we need to go to the network after all. - In addition to
maxAge, the Apollo Cache Control spec usesscopeto differentiate between public and private (per user) data. That may also be useful to incorporate here, or at least to make sure we leave room for additions like that.
Something else to mention is that I'm working on an overhaul of the tests, in preparation for making larger changes to the store and execution pipeline. Not sure what the right order here is for merging these two PRs. My hope is that we can merge the tests PR this week, after I've cleaned up and documented what's there right now, and Ellen has had a chance to review it.
I'll be working on larger changes in a separate PR next. That will involve getting rid of the promises implementation and making GraphQL execution synchronous (you'll still be able to perform multiple reads in parallel, it's just that the execution pipeline itself won't be asynchronous any more, which should make the code easier to reason about and solve some long standing threading issues). So we may need to coordinate on that as well to make sure we can merge both PRs.
| /// The date when the result was last received. | ||
| /// - Note: Apollo may merge several records with different ages when reading from cache data. | ||
| /// When such a merge happens, this value will be the age of the oldest record. | ||
| public let resultAge: Date |
There was a problem hiding this comment.
I think calling this maxAge would make the semantics clearer, and is also consistent with how we compute this for Apollo Cache Control on the server.
| import Foundation | ||
|
|
||
| /// Metadata about the returned result. | ||
| public struct GraphQLResultContext { |
There was a problem hiding this comment.
Context is a pretty overloaded term. Maybe GraphQLResultMetadata?
| func zip<Accumulator1: GraphQLResultAccumulator, Accumulator2: GraphQLResultAccumulator, Accumulator3: GraphQLResultAccumulator, Accumulator4: GraphQLResultAccumulator>(_ accumulator1: Accumulator1, _ accumulator2: Accumulator2, _ accumulator3: Accumulator3, _ accumulator4: Accumulator4) -> Zip4Accumulator<Accumulator1, Accumulator2, Accumulator3, Accumulator4> { | ||
| return Zip4Accumulator(accumulator1, accumulator2, accumulator3, accumulator4) | ||
| } | ||
|
|
There was a problem hiding this comment.
Yep, variadic generics is what I was talking about. But it indeed seems that might take a while to make it into the language, so we still need these workarounds.
|
|
||
| func accept(scalar: JSONValue, info: GraphQLResolveInfo) throws -> PartialResult | ||
| func acceptNullValue(info: GraphQLResolveInfo) throws -> PartialResult | ||
| func accept(scalar: JSONValue, firstReceivedAt: Date, info: GraphQLResolveInfo) throws -> PartialResult |
There was a problem hiding this comment.
I'd like to find a way to avoid adding individual arguments like firstReceivedAt to the core of the execution mechanism, because these additions have repercussions everywhere. Instead, I think we should try and come up with a way to pass around a general notion of updateable metadata (see related comment above for GraphQLResolver).
| import Foundation | ||
|
|
||
| /// A row of data that contains a `Record` and some associated metadata. | ||
| public struct RecordRow { |
There was a problem hiding this comment.
I don't think we need an additional wrapper type for this. Record already keeps key and fields as a separate properties, so maybe we could add metadata to that instead?
|
|
||
| /// A resolver is responsible for resolving a value for a field. | ||
| typealias GraphQLResolver = (_ object: JSONObject, _ info: GraphQLResolveInfo) -> ResultOrPromise<JSONValue?> | ||
| typealias GraphQLResolver = (_ object: JSONObject, _ info: GraphQLResolveInfo) -> ResultOrPromise<(JSONValue?, Date)> |
There was a problem hiding this comment.
I'd like to avoid changing every resolver to return a tuple, especially with something specific like a Date, because that forces the whole execution pipeline to pass that through. I think what we want here instead is a generic mechanism to allow resolvers to optionally set metadata.
There was a problem hiding this comment.
In Apollo Server, we add a cacheControl object to info for this purpose, so you can do:
const resolvers = {
Query: {
post: (_, { id }, _, { cacheControl }) => {
cacheControl.setCacheHint({ maxAge: 60 });
return find(posts, { id });
}
}
}I'm not too happy with that either, but at least it makes setting cache control hints optional and doesn't affect the rest of the execution pipeline.
There was a problem hiding this comment.
I think we could either add a metadata property to info, or pass metadata in as an additional argument.
| @@ -0,0 +1,27 @@ | |||
| import Foundation | |||
|
|
|||
| final class GraphQLFirstReceivedAtTracker: GraphQLResultAccumulator { | |||
There was a problem hiding this comment.
I like the idea of using a result accumulator for this, that was exactly the type of modular extension I was hoping the accumulator API would allow. The jury is still out on whether accumulators are a good idea, because they do seem to have a terrible performance impact. But we can worry about that later, when rethinking the execution pipeline as a whole.
| extensions: [String: Any]?, | ||
| errors: [GraphQLError]?, | ||
| source: Source, | ||
| dependentKeys: Set<CacheKey>?) { |
There was a problem hiding this comment.
I wonder if source and dependentKeys would also make sense as properties on a GraphQLResultMetadata.
| } | ||
|
|
||
| public func read<Query: GraphQLQuery>(query: Query) throws -> Query.Data { | ||
| public func read<Query: GraphQLQuery>(query: Query) throws -> (Query.Data, GraphQLResultContext) { |
There was a problem hiding this comment.
I feel returning a tuple makes the use of the store read API a bit awkward, because you're often just interested in the data, and now you have to destructure the returned tuple explicitly everywhere. Not sure if the alternative is any better though. I was thinking of returning GraphQLResult from all store read methods instead, but that means you have to go through data even if you're doing a readObject for an individual fragment.
There was a problem hiding this comment.
Maybe a better question to ask is: do we need the public read store API to provide access to metadata at all? You can always use a fetch with .returnCacheDataDontFetch to read a result from the cache. I think the main purpose of the store read API is for manual store manipulation within writes, so that may not require the metadata?
| /// A row of data that contains a `Record` and some associated metadata. | ||
| public struct RecordRow { | ||
| public internal(set) var record: Record | ||
| public internal(set) var lastReceivedAt: Date |
There was a problem hiding this comment.
Is it fine grained enough to keep this per record or do we need the ability to update this on a per field basis? If a query only fetches a subset of the fields, we still need to know that the fields that weren't fetched can be stale (their lastReceivedAt hasn't changed).
| let normalizer = GraphQLResultNormalizer() | ||
| let executor = GraphQLExecutor { object, info in | ||
| return .result(.success(object[info.responseKeyForField])) | ||
| return .result(.success((object[info.responseKeyForField], Date()))) |
There was a problem hiding this comment.
I don't think we always want to update the lastReceivedAtwith the current date for every write, because there's often a need to differentiate between server updates and (optimistic) client updates.
db5ed49 to
2e22ec8
Compare
|
So, with this PR, we've run into a bit of an existential dilemma, in that adding a timestamp at the record level isn't really doable, since there may be fields in one operation that get updated as part of a different operation. Ultimately, to make any kind of TTL mechanism work properly, what we need is field-level metadata (and we probably want it to be opt-in to reduce potential unnecessary overhead). There's also some stuff I know @martijnwalraven wants to address with how normalization works overall, and depending on which direction we ultimately decide with it, there are pretty significant changes Ultimately, this PR isn't going to be it, and I don't want to get anyone's hopes up by leaving this open. I'm going to open a new issue on the repo so we can track this as a feature request - I'll link it back to here. I really want to thank @gsabran and @Mordil for getting this going - I know it's frustrating that this isn't going to get merged, but we think we've got a line on a much more flexible (and accurate) way of doing this for the future. |
|
Please follow #1568 for further updates. Thank you! |
Thanks to @gsabran for doing practically all of this in #971!
Changes
RecordRowstruct that becomes the underlying storage base type forRecordSetthat contains theRecordandlastReceivedAtdateSQLIteNormalizedCacheto handle schema migrationsGraphQLResultContextobject for holding result metadata, such as theresultAgecontextproperty toGraphQLResultthat allows access to the result metadataNormalizedCache.loadRecordscompletion to receiveRecordRow?instead ofRecord?RecordSetto useRecordRowrather thanRecordSQLiteNormalizedCache.init(fileURL:shouldVacuumOnClear:)to be a convenience initializerSQLIteNormalizedCacheinitializers to accept an optionalRecordSetMotivation
Tests
Comments
The biggest difference between the implementations is that I went ahead and made the
GraphQLResultContexta property ofGraphQLResultwhich removes the need to change the handlers inApolloClientand making this a larger breaking change