Skip to content

Retain age of cache and result data#1498

Closed
Mordil wants to merge 13 commits intoapollographql:mainfrom
peek-travel:lastReceivedAt
Closed

Retain age of cache and result data#1498
Mordil wants to merge 13 commits intoapollographql:mainfrom
peek-travel:lastReceivedAt

Conversation

@Mordil
Copy link
Copy Markdown
Contributor

@Mordil Mordil commented Nov 6, 2020

Thanks to @gsabran for doing practically all of this in #971!

Changes

  • Add: RecordRow struct that becomes the underlying storage base type for RecordSet that contains the Record and lastReceivedAt date
  • Add: Mechanism to SQLIteNormalizedCache to handle schema migrations
  • Add: GraphQLResultContext object for holding result metadata, such as the resultAge
  • Add: context property to GraphQLResult that allows access to the result metadata
  • Change: NormalizedCache.loadRecords completion to receive RecordRow? instead of Record?
  • Change: RecordSet to use RecordRow rather than Record
  • Change: SQLiteNormalizedCache.init(fileURL:shouldVacuumOnClear:) to be a convenience initializer
  • Change: SQLIteNormalizedCache initializers to accept an optional RecordSet

Motivation

From @gsabran

Reading from cache is a great way to reduce the number of network requests emitted, and cloud hosting costs 🤑. However developers usually want to make sure that the cache result is not too old, and without this information they are likely to fetch a fresh result just in case. This change makes it possible to know how old the returned result is.

apolloClient.fetch(
  query: query,
  cachePolicy: .returnCacheDataDontFetch,
  context: nil,
  queue: .main
) { result in
  switch result {
    case let .success(result):
      if result.context.resultAge > Date().milisecondsSince1970 - 3600 {
        // use fresh enough data
      } else {
        // fetch
      }
    case let .failure(error):
      // handle error
  }
}

Tests

  1. Database schema migration
  2. Age of result validation

Comments

The biggest difference between the implementations is that I went ahead and made the GraphQLResultContext a property of GraphQLResult which removes the need to change the handlers in ApolloClient and making this a larger breaking change

Gui Sabran added 2 commits November 5, 2020 22:19
Motivation:

Any time a developer wants to cache data they inevitably also want a way to invalidate the cache or know when to ignore its data.

In order to support those use cases, the date that the data was last verified as received needs to be stored.

Modifications:

- Add: `RecordRow` struct that becomes the underlying storage base type for `RecordSet` that contains the `Record` and `lastReceivedAt` date
- Add: Mechanism to `SQLIteNormalizedCache` to handle schema migrations

Result:

TTL and cache age will be soon possible to query based on the data's `lastReceivedAt` date.
Motivation:

`RecordRow.lastReceivedAt` is an important property that developers want access to in order to inspect
the age of a given result.

In order to provide this, it needs to be returned in GraphQL results in some fashion.

Modifications:

- Add: `GraphQLResultContext` object for holding result metadata, such as the `resultAge`
- Add: `context` property to `GraphQLResult` that allows access to the result metadata
- Change: `NormalizedCache.loadRecords` completion to receive `RecordRow?` instead of `Record?`
- Change: `RecordSet` to use `RecordRow` rather than `Record`
- Change: `SQLiteNormalizedCache.init(fileURL:shouldVacuumOnClear:)` to be a convenience initializer
- Change: `SQLIteNormalizedCache` initializers to accept an optional `RecordSet`

Result:

Developers can now access a `context` property on `GraphQLResult` to inspect metadata behind a given result.
@apollo-cla
Copy link
Copy Markdown

@Mordil: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

@Mordil
Copy link
Copy Markdown
Contributor Author

Mordil commented Nov 6, 2020

@designatednerd I think I addressed all of the feedback you had left on #971

@Mordil
Copy link
Copy Markdown
Contributor Author

Mordil commented Nov 6, 2020

@gsabran If there's a different email you want associated with the commit author attribution - let me know and I'll update the commits

@Mordil Mordil marked this pull request as ready for review November 6, 2020 06:47
@designatednerd
Copy link
Copy Markdown
Contributor

I'm going to ask @martijnwalraven to also take a look at this when he's back on Tuesday - he's been neck-deep in cache stuff so I think he will have more opinions than I do.

Copy link
Copy Markdown
Contributor

@designatednerd designatednerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly about formatting and dates. I have gone down way, way, way too many date rabbit holes.

I definitely think @martijnwalraven will have more concrete feedback on the actual functionality here.

Comment thread Sources/Apollo/GraphQLFirstReceivedAtTracker.swift Outdated
Comment thread Sources/Apollo/GraphQLFirstReceivedAtTracker.swift Outdated
Comment thread Sources/Apollo/GraphQLFirstReceivedAtTracker.swift Outdated
Comment thread Sources/Apollo/GraphQLResponse.swift Outdated
func zip<Accumulator1: GraphQLResultAccumulator, Accumulator2: GraphQLResultAccumulator, Accumulator3: GraphQLResultAccumulator, Accumulator4: GraphQLResultAccumulator>(_ accumulator1: Accumulator1, _ accumulator2: Accumulator2, _ accumulator3: Accumulator3, _ accumulator4: Accumulator4) -> Zip4Accumulator<Accumulator1, Accumulator2, Accumulator3, Accumulator4> {
return Zip4Accumulator(accumulator1, accumulator2, accumulator3, accumulator4)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnwalraven I feel like this is something you told me might no longer be necessary with changes to Swift's generics handling - do you have any recollection?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still needed, if Accumulator1, Accumulator2, ... are different concrete types. There might be some changes to Swift generics in the future to remove this code, specifically "Variadic Generics", but that's still a long way to go.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, variadic generics is what I was talking about. But it indeed seems that might take a while to make it into the language, so we still need these workarounds.

Comment thread Sources/ApolloTestSupport/TimeInterval+Helpers.swift Outdated
Comment thread Tests/ApolloCacheDependentTests/LoadQueryFromStoreTests.swift Outdated
Comment thread Tests/ApolloCacheDependentTests/ReadWriteFromStoreTests.swift Outdated
Comment thread Tests/ApolloSQLiteTests/CachePersistenceTests.swift Outdated
Comment thread Tests/ApolloTests/BatchedLoadTests.swift Outdated
Copy link
Copy Markdown

@gsabran gsabran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wao this is exciting!! To avoid issues with the user changing their device clock, we can use something like:

/// Return a timestamp in ms from an arbitrary origin that is consistent for a given device.
func getCurrentTimestamp() -> Int {
  Int(clock_gettime_nsec_np(CLOCK_MONOTONIC) / 1000000)
}

Copy link
Copy Markdown
Contributor

@designatednerd designatednerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, only a couple comments - will wait for Martijn to get a look Tuesday before proceeding. Thanks for the quick turnaround!

Comment thread Tests/ApolloCacheDependentTests/LoadQueryFromStoreTests.swift Outdated
Comment thread Tests/ApolloSQLiteTests/CachePersistenceTests.swift Outdated
@designatednerd
Copy link
Copy Markdown
Contributor

@gsabran What's the timestamp for?

Copy link
Copy Markdown
Contributor

@martijnwalraven martijnwalraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! This is definitely a worthwhile feature to add. My comments are mainly about the way we want to extend the existing API in a way that sets us up for future extensions.

The TL;DR is that I'm wary of adding specific pieces of metadata like firstReceivedAt to the core of the execution pipeline, because that has repercussions everywhere (it changes the resolver return value, needs to be passed around in all execution methods, the accumulator, etc.). Instead, I think we should take this opportunity to come up with a general metadata mechanism that we can take advantage of for other use cases.

A few additional pieces of feedback that didn't have a clear place to leave them as comments:

  • The term firstReceivedAt seems a bit confusing to me. Maybe lastUpdateFromServer would be a better description? That also depends on what semantics we expect from this. I don't think we always want to update the lastReceivedAtwith the current date for every write, because there's often a need to differentiate between server updates and (optimistic) client updates. So maybe we need a separate property for that?
  • I'm also not sure the semantics of keeping this per record (as opposed to per field) make sense, because you often fetch subsets of the fields for an object (see comment below).
  • Although I think it makes sense to give people access to firstReceivedAt from a result when they have a need for custom logic, the example code listed under Motivation seems like the main use case and I think we should have explicit support for it. Maybe some kind of maxAge condition on .returnCacheDataDontFetch, or a general way to pass a predicate for a conditional fetch (a la HTTP Cache Control)? It could also be something you configure (overridable) defaults for on ApolloClient. Besides avoiding repetitive code everywhere a result is received, that would also avoid an unnecessary switch to the main queue and back in case the data is stale and we need to go to the network after all.
  • In addition to maxAge, the Apollo Cache Control spec uses scope to differentiate between public and private (per user) data. That may also be useful to incorporate here, or at least to make sure we leave room for additions like that.

Something else to mention is that I'm working on an overhaul of the tests, in preparation for making larger changes to the store and execution pipeline. Not sure what the right order here is for merging these two PRs. My hope is that we can merge the tests PR this week, after I've cleaned up and documented what's there right now, and Ellen has had a chance to review it.

I'll be working on larger changes in a separate PR next. That will involve getting rid of the promises implementation and making GraphQL execution synchronous (you'll still be able to perform multiple reads in parallel, it's just that the execution pipeline itself won't be asynchronous any more, which should make the code easier to reason about and solve some long standing threading issues). So we may need to coordinate on that as well to make sure we can merge both PRs.

Comment thread Sources/Apollo/GraphQLResult.swift Outdated
/// The date when the result was last received.
/// - Note: Apollo may merge several records with different ages when reading from cache data.
/// When such a merge happens, this value will be the age of the oldest record.
public let resultAge: Date
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling this maxAge would make the semantics clearer, and is also consistent with how we compute this for Apollo Cache Control on the server.

Comment thread Sources/Apollo/GraphQLResult.swift Outdated
import Foundation

/// Metadata about the returned result.
public struct GraphQLResultContext {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context is a pretty overloaded term. Maybe GraphQLResultMetadata?

func zip<Accumulator1: GraphQLResultAccumulator, Accumulator2: GraphQLResultAccumulator, Accumulator3: GraphQLResultAccumulator, Accumulator4: GraphQLResultAccumulator>(_ accumulator1: Accumulator1, _ accumulator2: Accumulator2, _ accumulator3: Accumulator3, _ accumulator4: Accumulator4) -> Zip4Accumulator<Accumulator1, Accumulator2, Accumulator3, Accumulator4> {
return Zip4Accumulator(accumulator1, accumulator2, accumulator3, accumulator4)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, variadic generics is what I was talking about. But it indeed seems that might take a while to make it into the language, so we still need these workarounds.


func accept(scalar: JSONValue, info: GraphQLResolveInfo) throws -> PartialResult
func acceptNullValue(info: GraphQLResolveInfo) throws -> PartialResult
func accept(scalar: JSONValue, firstReceivedAt: Date, info: GraphQLResolveInfo) throws -> PartialResult
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to find a way to avoid adding individual arguments like firstReceivedAt to the core of the execution mechanism, because these additions have repercussions everywhere. Instead, I think we should try and come up with a way to pass around a general notion of updateable metadata (see related comment above for GraphQLResolver).

import Foundation

/// A row of data that contains a `Record` and some associated metadata.
public struct RecordRow {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an additional wrapper type for this. Record already keeps key and fields as a separate properties, so maybe we could add metadata to that instead?


/// A resolver is responsible for resolving a value for a field.
typealias GraphQLResolver = (_ object: JSONObject, _ info: GraphQLResolveInfo) -> ResultOrPromise<JSONValue?>
typealias GraphQLResolver = (_ object: JSONObject, _ info: GraphQLResolveInfo) -> ResultOrPromise<(JSONValue?, Date)>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid changing every resolver to return a tuple, especially with something specific like a Date, because that forces the whole execution pipeline to pass that through. I think what we want here instead is a generic mechanism to allow resolvers to optionally set metadata.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Apollo Server, we add a cacheControl object to info for this purpose, so you can do:

const resolvers = {
  Query: {
    post: (_, { id }, _, { cacheControl }) => {
      cacheControl.setCacheHint({ maxAge: 60 });
      return find(posts, { id });
    }
  }
}

I'm not too happy with that either, but at least it makes setting cache control hints optional and doesn't affect the rest of the execution pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could either add a metadata property to info, or pass metadata in as an additional argument.

@@ -0,0 +1,27 @@
import Foundation

final class GraphQLFirstReceivedAtTracker: GraphQLResultAccumulator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using a result accumulator for this, that was exactly the type of modular extension I was hoping the accumulator API would allow. The jury is still out on whether accumulators are a good idea, because they do seem to have a terrible performance impact. But we can worry about that later, when rethinking the execution pipeline as a whole.

extensions: [String: Any]?,
errors: [GraphQLError]?,
source: Source,
dependentKeys: Set<CacheKey>?) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if source and dependentKeys would also make sense as properties on a GraphQLResultMetadata.

Comment thread Sources/Apollo/ApolloStore.swift Outdated
}

public func read<Query: GraphQLQuery>(query: Query) throws -> Query.Data {
public func read<Query: GraphQLQuery>(query: Query) throws -> (Query.Data, GraphQLResultContext) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel returning a tuple makes the use of the store read API a bit awkward, because you're often just interested in the data, and now you have to destructure the returned tuple explicitly everywhere. Not sure if the alternative is any better though. I was thinking of returning GraphQLResult from all store read methods instead, but that means you have to go through data even if you're doing a readObject for an individual fragment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better question to ask is: do we need the public read store API to provide access to metadata at all? You can always use a fetch with .returnCacheDataDontFetch to read a result from the cache. I think the main purpose of the store read API is for manual store manipulation within writes, so that may not require the metadata?

/// A row of data that contains a `Record` and some associated metadata.
public struct RecordRow {
public internal(set) var record: Record
public internal(set) var lastReceivedAt: Date
Copy link
Copy Markdown
Contributor

@martijnwalraven martijnwalraven Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine grained enough to keep this per record or do we need the ability to update this on a per field basis? If a query only fetches a subset of the fields, we still need to know that the fields that weren't fetched can be stale (their lastReceivedAt hasn't changed).

let normalizer = GraphQLResultNormalizer()
let executor = GraphQLExecutor { object, info in
return .result(.success(object[info.responseKeyForField]))
return .result(.success((object[info.responseKeyForField], Date())))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we always want to update the lastReceivedAtwith the current date for every write, because there's often a need to differentiate between server updates and (optimistic) client updates.

@designatednerd
Copy link
Copy Markdown
Contributor

So, with this PR, we've run into a bit of an existential dilemma, in that adding a timestamp at the record level isn't really doable, since there may be fields in one operation that get updated as part of a different operation. Ultimately, to make any kind of TTL mechanism work properly, what we need is field-level metadata (and we probably want it to be opt-in to reduce potential unnecessary overhead).

There's also some stuff I know @martijnwalraven wants to address with how normalization works overall, and depending on which direction we ultimately decide with it, there are pretty significant changes

Ultimately, this PR isn't going to be it, and I don't want to get anyone's hopes up by leaving this open. I'm going to open a new issue on the repo so we can track this as a feature request - I'll link it back to here.

I really want to thank @gsabran and @Mordil for getting this going - I know it's frustrating that this isn't going to get merged, but we think we've got a line on a much more flexible (and accurate) way of doing this for the future.

@designatednerd
Copy link
Copy Markdown
Contributor

Please follow #1568 for further updates. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants