Retain age of cache and result data by Mordil · Pull Request #1498 · apollographql/apollo-ios

Mordil · 2020-11-06T06:37:20Z

Thanks to @gsabran for doing practically all of this in #971!

Changes

Add: RecordRow struct that becomes the underlying storage base type for RecordSet that contains the Record and lastReceivedAt date
Add: Mechanism to SQLIteNormalizedCache to handle schema migrations
Add: GraphQLResultContext object for holding result metadata, such as the resultAge
Add: context property to GraphQLResult that allows access to the result metadata
Change: NormalizedCache.loadRecords completion to receive RecordRow? instead of Record?
Change: RecordSet to use RecordRow rather than Record
Change: SQLiteNormalizedCache.init(fileURL:shouldVacuumOnClear:) to be a convenience initializer
Change: SQLIteNormalizedCache initializers to accept an optional RecordSet

Motivation

From @gsabran

Reading from cache is a great way to reduce the number of network requests emitted, and cloud hosting costs 🤑. However developers usually want to make sure that the cache result is not too old, and without this information they are likely to fetch a fresh result just in case. This change makes it possible to know how old the returned result is.

apolloClient.fetch(
  query: query,
  cachePolicy: .returnCacheDataDontFetch,
  context: nil,
  queue: .main
) { result in
  switch result {
    case let .success(result):
      if result.context.resultAge > Date().milisecondsSince1970 - 3600 {
        // use fresh enough data
      } else {
        // fetch
      }
    case let .failure(error):
      // handle error
  }
}

Tests

Database schema migration
Age of result validation

Comments

The biggest difference between the implementations is that I went ahead and made the GraphQLResultContext a property of GraphQLResult which removes the need to change the handlers in ApolloClient and making this a larger breaking change

Motivation: Any time a developer wants to cache data they inevitably also want a way to invalidate the cache or know when to ignore its data. In order to support those use cases, the date that the data was last verified as received needs to be stored. Modifications: - Add: `RecordRow` struct that becomes the underlying storage base type for `RecordSet` that contains the `Record` and `lastReceivedAt` date - Add: Mechanism to `SQLIteNormalizedCache` to handle schema migrations Result: TTL and cache age will be soon possible to query based on the data's `lastReceivedAt` date.

Motivation: `RecordRow.lastReceivedAt` is an important property that developers want access to in order to inspect the age of a given result. In order to provide this, it needs to be returned in GraphQL results in some fashion. Modifications: - Add: `GraphQLResultContext` object for holding result metadata, such as the `resultAge` - Add: `context` property to `GraphQLResult` that allows access to the result metadata - Change: `NormalizedCache.loadRecords` completion to receive `RecordRow?` instead of `Record?` - Change: `RecordSet` to use `RecordRow` rather than `Record` - Change: `SQLiteNormalizedCache.init(fileURL:shouldVacuumOnClear:)` to be a convenience initializer - Change: `SQLIteNormalizedCache` initializers to accept an optional `RecordSet` Result: Developers can now access a `context` property on `GraphQLResult` to inspect metadata behind a given result.

apollo-cla · 2020-11-06T06:37:23Z

@Mordil: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

Mordil · 2020-11-06T06:39:15Z

@designatednerd I think I addressed all of the feedback you had left on #971

Mordil · 2020-11-06T06:44:28Z

@gsabran If there's a different email you want associated with the commit author attribution - let me know and I'll update the commits

designatednerd · 2020-11-06T17:36:47Z

I'm going to ask @martijnwalraven to also take a look at this when he's back on Tuesday - he's been neck-deep in cache stuff so I think he will have more opinions than I do.

designatednerd

My comments are mostly about formatting and dates. I have gone down way, way, way too many date rabbit holes.

I definitely think @martijnwalraven will have more concrete feedback on the actual functionality here.

designatednerd · 2020-11-06T23:39:17Z

+func zip<Accumulator1: GraphQLResultAccumulator, Accumulator2: GraphQLResultAccumulator, Accumulator3: GraphQLResultAccumulator, Accumulator4: GraphQLResultAccumulator>(_ accumulator1: Accumulator1, _ accumulator2: Accumulator2, _ accumulator3: Accumulator3, _ accumulator4: Accumulator4) -> Zip4Accumulator<Accumulator1, Accumulator2, Accumulator3, Accumulator4> {
+  return Zip4Accumulator(accumulator1, accumulator2, accumulator3, accumulator4)
+}
+


@martijnwalraven I feel like this is something you told me might no longer be necessary with changes to Swift's generics handling - do you have any recollection?

I think this is still needed, if Accumulator1, Accumulator2, ... are different concrete types. There might be some changes to Swift generics in the future to remove this code, specifically "Variadic Generics", but that's still a long way to go.

Yep, variadic generics is what I was talking about. But it indeed seems that might take a while to make it into the language, so we still need these workarounds.

gsabran

Wao this is exciting!! To avoid issues with the user changing their device clock, we can use something like:

/// Return a timestamp in ms from an arbitrary origin that is consistent for a given device.
func getCurrentTimestamp() -> Int {
  Int(clock_gettime_nsec_np(CLOCK_MONOTONIC) / 1000000)
}

designatednerd

Awesome, only a couple comments - will wait for Martijn to get a look Tuesday before proceeding. Thanks for the quick turnaround!

designatednerd · 2020-11-07T23:43:49Z

@gsabran What's the timestamp for?

martijnwalraven

Thanks for this PR! This is definitely a worthwhile feature to add. My comments are mainly about the way we want to extend the existing API in a way that sets us up for future extensions.

The TL;DR is that I'm wary of adding specific pieces of metadata like firstReceivedAt to the core of the execution pipeline, because that has repercussions everywhere (it changes the resolver return value, needs to be passed around in all execution methods, the accumulator, etc.). Instead, I think we should take this opportunity to come up with a general metadata mechanism that we can take advantage of for other use cases.

A few additional pieces of feedback that didn't have a clear place to leave them as comments:

The term firstReceivedAt seems a bit confusing to me. Maybe lastUpdateFromServer would be a better description? That also depends on what semantics we expect from this. I don't think we always want to update the lastReceivedAtwith the current date for every write, because there's often a need to differentiate between server updates and (optimistic) client updates. So maybe we need a separate property for that?
I'm also not sure the semantics of keeping this per record (as opposed to per field) make sense, because you often fetch subsets of the fields for an object (see comment below).
Although I think it makes sense to give people access to firstReceivedAt from a result when they have a need for custom logic, the example code listed under Motivation seems like the main use case and I think we should have explicit support for it. Maybe some kind of maxAge condition on .returnCacheDataDontFetch, or a general way to pass a predicate for a conditional fetch (a la HTTP Cache Control)? It could also be something you configure (overridable) defaults for on ApolloClient. Besides avoiding repetitive code everywhere a result is received, that would also avoid an unnecessary switch to the main queue and back in case the data is stale and we need to go to the network after all.
In addition to maxAge, the Apollo Cache Control spec uses scope to differentiate between public and private (per user) data. That may also be useful to incorporate here, or at least to make sure we leave room for additions like that.

Something else to mention is that I'm working on an overhaul of the tests, in preparation for making larger changes to the store and execution pipeline. Not sure what the right order here is for merging these two PRs. My hope is that we can merge the tests PR this week, after I've cleaned up and documented what's there right now, and Ellen has had a chance to review it.

I'll be working on larger changes in a separate PR next. That will involve getting rid of the promises implementation and making GraphQL execution synchronous (you'll still be able to perform multiple reads in parallel, it's just that the execution pipeline itself won't be asynchronous any more, which should make the code easier to reason about and solve some long standing threading issues). So we may need to coordinate on that as well to make sure we can merge both PRs.

martijnwalraven · 2020-11-10T06:10:04Z

+  /// The date when the result was last received.
+  /// - Note: Apollo may merge several records with different ages when reading from cache data.
+  /// When such a merge happens, this value will be the age of the oldest record.
+  public let resultAge: Date


I think calling this maxAge would make the semantics clearer, and is also consistent with how we compute this for Apollo Cache Control on the server.

martijnwalraven · 2020-11-10T06:12:23Z

 import Foundation

+/// Metadata about the returned result.
+public struct GraphQLResultContext {


Context is a pretty overloaded term. Maybe GraphQLResultMetadata?

martijnwalraven · 2020-11-10T06:15:48Z

+func zip<Accumulator1: GraphQLResultAccumulator, Accumulator2: GraphQLResultAccumulator, Accumulator3: GraphQLResultAccumulator, Accumulator4: GraphQLResultAccumulator>(_ accumulator1: Accumulator1, _ accumulator2: Accumulator2, _ accumulator3: Accumulator3, _ accumulator4: Accumulator4) -> Zip4Accumulator<Accumulator1, Accumulator2, Accumulator3, Accumulator4> {
+  return Zip4Accumulator(accumulator1, accumulator2, accumulator3, accumulator4)
+}
+


Yep, variadic generics is what I was talking about. But it indeed seems that might take a while to make it into the language, so we still need these workarounds.

martijnwalraven · 2020-11-10T06:19:19Z


-  func accept(scalar: JSONValue, info: GraphQLResolveInfo) throws -> PartialResult
-  func acceptNullValue(info: GraphQLResolveInfo) throws -> PartialResult
+  func accept(scalar: JSONValue, firstReceivedAt: Date, info: GraphQLResolveInfo) throws -> PartialResult


I'd like to find a way to avoid adding individual arguments like firstReceivedAt to the core of the execution mechanism, because these additions have repercussions everywhere. Instead, I think we should try and come up with a way to pass around a general notion of updateable metadata (see related comment above for GraphQLResolver).

martijnwalraven · 2020-11-10T06:21:38Z

+import Foundation
+
+/// A row of data that contains a `Record` and some associated metadata.
+public struct RecordRow {


I don't think we need an additional wrapper type for this. Record already keeps key and fields as a separate properties, so maybe we could add metadata to that instead?

martijnwalraven · 2020-11-10T06:32:56Z


 /// A resolver is responsible for resolving a value for a field.
-typealias GraphQLResolver = (_ object: JSONObject, _ info: GraphQLResolveInfo) -> ResultOrPromise<JSONValue?>
+typealias GraphQLResolver = (_ object: JSONObject, _ info: GraphQLResolveInfo) -> ResultOrPromise<(JSONValue?, Date)>


I'd like to avoid changing every resolver to return a tuple, especially with something specific like a Date, because that forces the whole execution pipeline to pass that through. I think what we want here instead is a generic mechanism to allow resolvers to optionally set metadata.

In Apollo Server, we add a cacheControl object to info for this purpose, so you can do:

const resolvers = { Query: { post: (_, { id }, _, { cacheControl }) => { cacheControl.setCacheHint({ maxAge: 60 }); return find(posts, { id }); } } }

I'm not too happy with that either, but at least it makes setting cache control hints optional and doesn't affect the rest of the execution pipeline.

I think we could either add a metadata property to info, or pass metadata in as an additional argument.

martijnwalraven · 2020-11-10T06:39:15Z

@@ -0,0 +1,27 @@
+import Foundation
+
+final class GraphQLFirstReceivedAtTracker: GraphQLResultAccumulator {


I like the idea of using a result accumulator for this, that was exactly the type of modular extension I was hoping the accumulator API would allow. The jury is still out on whether accumulators are a good idea, because they do seem to have a terrible performance impact. But we can worry about that later, when rethinking the execution pipeline as a whole.

martijnwalraven · 2020-11-10T06:44:51Z

              extensions: [String: Any]?,
              errors: [GraphQLError]?,
              source: Source,
-              dependentKeys: Set<CacheKey>?) {


I wonder if source and dependentKeys would also make sense as properties on a GraphQLResultMetadata.

martijnwalraven · 2020-11-10T07:05:08Z

    }

-    public func read<Query: GraphQLQuery>(query: Query) throws -> Query.Data {
+    public func read<Query: GraphQLQuery>(query: Query) throws -> (Query.Data, GraphQLResultContext) {


I feel returning a tuple makes the use of the store read API a bit awkward, because you're often just interested in the data, and now you have to destructure the returned tuple explicitly everywhere. Not sure if the alternative is any better though. I was thinking of returning GraphQLResult from all store read methods instead, but that means you have to go through data even if you're doing a readObject for an individual fragment.

Maybe a better question to ask is: do we need the public read store API to provide access to metadata at all? You can always use a fetch with .returnCacheDataDontFetch to read a result from the cache. I think the main purpose of the store read API is for manual store manipulation within writes, so that may not require the metadata?

martijnwalraven · 2020-11-10T07:31:35Z

+/// A row of data that contains a `Record` and some associated metadata.
+public struct RecordRow {
+  public internal(set) var record: Record
+  public internal(set) var lastReceivedAt: Date


Is it fine grained enough to keep this per record or do we need the ability to update this on a per field basis? If a query only fetches a subset of the fields, we still need to know that the fields that weren't fetched can be stale (their lastReceivedAt hasn't changed).

martijnwalraven · 2020-11-10T07:40:19Z

      let normalizer = GraphQLResultNormalizer()
      let executor = GraphQLExecutor { object, info in
-        return .result(.success(object[info.responseKeyForField]))
+        return .result(.success((object[info.responseKeyForField], Date())))


I don't think we always want to update the lastReceivedAtwith the current date for every write, because there's often a need to differentiate between server updates and (optimistic) client updates.

designatednerd · 2020-12-09T23:44:00Z

So, with this PR, we've run into a bit of an existential dilemma, in that adding a timestamp at the record level isn't really doable, since there may be fields in one operation that get updated as part of a different operation. Ultimately, to make any kind of TTL mechanism work properly, what we need is field-level metadata (and we probably want it to be opt-in to reduce potential unnecessary overhead).

There's also some stuff I know @martijnwalraven wants to address with how normalization works overall, and depending on which direction we ultimately decide with it, there are pretty significant changes

Ultimately, this PR isn't going to be it, and I don't want to get anyone's hopes up by leaving this open. I'm going to open a new issue on the repo so we can track this as a feature request - I'll link it back to here.

I really want to thank @gsabran and @Mordil for getting this going - I know it's frustrating that this isn't going to get merged, but we think we've got a line on a much more flexible (and accurate) way of doing this for the future.

designatednerd · 2020-12-09T23:51:52Z

Please follow #1568 for further updates. Thank you!

Gui Sabran added 2 commits November 5, 2020 22:19

Mordil mentioned this pull request Nov 6, 2020

Retain age of cache and result data peek-travel/apollo-ios#1

Closed

Mordil marked this pull request as ready for review November 6, 2020 06:47

designatednerd reviewed Nov 7, 2020

View reviewed changes

designatednerd mentioned this pull request Nov 7, 2020

Support returning how old the data is #971

Closed

Mordil added 2 commits November 6, 2020 18:11

import Foundation entirely

5596a48

Use explicit type initialization

e671337

gsabran reviewed Nov 7, 2020

View reviewed changes

Mordil added 5 commits November 6, 2020 18:41

Use calendar API instead of TimeInterval

3357389

clean up map anonymous parameters in test

9de0aef

Use distantPast to make explicit of default value

21765b1

Change migration unit test to use snapshot files

5df4e44

Remove Carthage subdirectory

2a9c2d9

designatednerd reviewed Nov 7, 2020

View reviewed changes

Comment thread Tests/ApolloCacheDependentTests/LoadQueryFromStoreTests.swift Outdated

Comment thread Tests/ApolloSQLiteTests/CachePersistenceTests.swift Outdated

Mordil and others added 3 commits November 7, 2020 15:52

Move 'min' free func to 'ApolloMath' namespace

356be4d

Use same Date reference in unit tests

46dcdcf

Update test comment to remove reference of a bug

6ae1c8a

martijnwalraven reviewed Nov 10, 2020

View reviewed changes

Rename 'GraphQLResultContext' to 'GraphQLResultMetadata'

2e22ec8

Mordil force-pushed the lastReceivedAt branch from db5ed49 to 2e22ec8 Compare November 12, 2020 09:58

martijnwalraven mentioned this pull request Nov 17, 2020

Add 'CacheClearingPolicy' for more control over how the cache is cleared #1512

Closed

Mordil mentioned this pull request Nov 18, 2020

lastReceivedAt MVP peek-travel/apollo-ios#3

Merged

designatednerd added the caching label Nov 24, 2020

designatednerd mentioned this pull request Dec 9, 2020

Add cache fallback CachePolicy #1520

Closed

designatednerd closed this Dec 9, 2020

designatednerd mentioned this pull request Dec 9, 2020

Field-level metadata for the cache #1568

Open

Mordil deleted the lastReceivedAt branch January 26, 2023 00:33

Mordil mentioned this pull request Feb 10, 2023

What is the feature parity between 0.x versions and 1.x? #2822

Closed

		@@ -0,0 +1,27 @@
		import Foundation

		final class GraphQLFirstReceivedAtTracker: GraphQLResultAccumulator {

Conversation

Mordil commented Nov 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Motivation

Tests

Comments

Uh oh!

apollo-cla commented Nov 6, 2020

Uh oh!

Mordil commented Nov 6, 2020

Uh oh!

Mordil commented Nov 6, 2020

Uh oh!

designatednerd commented Nov 6, 2020

Uh oh!

designatednerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gsabran left a comment

Choose a reason for hiding this comment

Uh oh!

designatednerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

designatednerd commented Nov 7, 2020

Uh oh!

martijnwalraven left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnwalraven Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

designatednerd commented Dec 9, 2020

Uh oh!

Mordil commented Nov 6, 2020 •

edited

Loading

martijnwalraven left a comment •

edited

Loading

martijnwalraven Nov 10, 2020 •

edited

Loading