Add 'CacheClearingPolicy' for more control over how the cache is cleared by Mordil · Pull Request #1512 · apollographql/apollo-ios

Mordil · 2020-11-12T09:45:41Z

Motivation:

It is common to want finer control over which records are removed from the cache without just deleting all records.

This modifies the framework to support different policies that a developer can specify when calling clear and clearImmediately.

Modifications:

Rename: shouldVacuumOnClear argument label in SQLiteNormalizedCache initializers to compactFileOnClear to make it clearer for those unfamiliar with SQLite's terminology
Add: CacheClearingPolicy with policies for removing the first/last K records, key pattern matching, and the current behavior of removing all records
Add: ApolloStore.clearCache overload that accepts a CacheClearingPolicy
Change: RecordSet.clear to accept a CacheClearingPolicy
Change: NormalizedCache.clear and NormalizedCache.clearImmediately to accept a CacheClearingPolicy
- The current methods are available as overloads
Change: ApolloClientProtocol.clearCache to accept a CacheClearingPolicy
- The current method is available as an overload

Result:

Developers can now have more control over which records to clear from their cache:

// current behavior
client.clearCache(usingPolicy: .allRecords)

// delete the first 10 records
client.clearCache(usingPolicy: .first(10))

// delete all hero records
client.clearCache(usingPolicy: .allMatchingKeyPattern("*hero*"))

designatednerd

Again definitely recommend talking through this with @martijnwalraven, but this seems like a reasonable complement to what you're working on in #1498

martijnwalraven · 2020-11-17T18:32:03Z

+  /// Clears all records whose key matches the provided glob pattern.
+  ///
+  /// For example `*pollo` will match both `Apollo` and `pollo`.
+  case allMatchingKeyPattern(String)


I wonder how useful a regex really is here, because real keys (based on id or other key fields) are usually pretty opaque, and we're actually planning to move away from using path-based keys as a fallback completely.

So it might make more sense to follow Apollo Client web here and just offer a way to clear/evict records by key, perhaps allowing multiple keys to be passed in.

Until we get rid of path-based keys, maybe it would make sense to have evict remove all records that have the specified key as a prefix?

I can see that possible, but what if I want to evict all query responses? .allMatchingKeyPattern("*search*")

With a prefix that requires me knowing the implementation detail that Apollo uses the QUERY_ROOT prefix in order to evict query responses.

.anyKeyWithPrefix("QUERY_ROOT") is effectively the same as .allMatchingKeyPattern("QUERY_ROOT*")

What would you expect .allMatchingKeyPattern("*search*") to do, and what is the use case for that?

We definitely have to do a better job explaining how the normalized cache works. My hope is that removing path-based keys will make things easier, but I also want to make sure our APIs don't complicate matters further.

For now, the problem is that .allMatchingKeyPattern("QUERY_ROOT*") likely won't do what people think it does. It would only remove records that do not have their own key configured (through cacheKeyForObject()). Those objects won't have an id or other key field that is used to uniquely identify them, so we currently use a fake path-based key, which means the results of different queries can't be merged.

Most of the time, that is not what you want, because that means missing out on all the benefits of normalization (while still paying the performance costs of storing objects as separate records). We will likely follow the example of Apollo Client web v3, and simply nest these "value objects" within the same record.

At the same time, we should make it easier for people to configure keys (likely with sensible defaults that concatenates __typename and id). What does your cacheKeyForObject() look like right now?

If your cacheKeyForObject() concatenates __typename and id, and the result of a query that requests user(id: 123) is stored in the cache, QUERY_ROOT would only contain a reference to the User:123 record. So you wouldn't remove that record by removing QUERY_ROOT*, just the mapping between user(id: 123) and that record.

I suspect the more common case would be to want to remove individual objects by key (so removing the user with id 123 for example). Ideally we'd find a way to expose this in a type safe way, but for now it may be ok to rely on manually constructing keys (after all, that's what cacheKeyForObject() also relies on). So you'd do a store.evict("User:123") for example. There may also be a need to remove individual fields, especially on the query root.

What scenarios do you see in your app? How fine grained are the cache eviction operations you have found a need for?

Our cacheKeyForObject is looking at just __typename and id, otherwise returning nil.

We don't have a particular use case for this clearing policy in question - we're actually after an age-based policy, but since that is some longer-term work, we have the policy implemented only in our fork for now.

These other policies were ones that I brainstormed in order to make this a valuable change to the codebase, rather than adding an enum with a single case.

If you think there's value in just making the change to add a single case (which is the default value used everywhere) as a foundation to build off of, and we later add policies on a singular case-by-case basis, then I'm happy to remove this policy.

My example is coming from the GitHub GraphQL API with a very minimal setup of Apollo that uses the same __typename and id cache key generation.

An example of a repository search would provide the following record:

_id key record

247393 QUERY_ROOT.search(after:Y3Vyc29yOjc2MA==,first:30,query:Design,type:REPOSITORY).edges.29 {"node":{"$reference":"Repository:MDEwOlJlcG9zaXRvcnkyMjMyOTQzMjA="},"__typename":"SearchResultItemEdge"}

I would expect *search* to remove any key that contains the phrase search. I wouldn't necessarily expect the referenced records to be deleted, just the query response record(s) themselves.

Doing it by prefix provides two issues:

It forces me to understand that the default key prefix is QUERY_ROOT, which is an implementation detail that could, and you allude to it will, change

If I want to delete all searches, I need to have the longer "QUERY_ROOT.search" as the value, which adds noise to the intent of the cache clear: I just want to clear up any key that is a search related record.

Taking away the situation where Apollo's defaults are used partially or wholly - having a prefix only policy pushes effort on the cache key algorithm the developers establishes to be mindful of possibly easier cleanup later because they would then have to specify the prefix in order to match when it's a glob pattern, they can be much more liberal with their key generation.

I think what may be confusing about the current situation is that the path-based fallback blurs the distinction between object keys and field keys. An object key is the result of cacheKeyForObject, so something like User:123. A field key is generated by combining the field name with argument values.

Because we currently generate separate records for every level of a query result, regardless of whether an object has a real cache key, you end up with lots of useless records like QUERY_ROOT.search(after:Y3Vyc29yOjc2MA==,first:30,query:Design,type:REPOSITORY).edges.29.

Note that search is not the cache key of an object here, it is a field name. It's an implementation detail that we currently store every edge of a connection as a separate record, using the complete path as a key, and that is something we plan on fixing. But that's exactly the reason the API shouldn't rely on this.

The only real cache keys are things like User:123, and Repository:MDEwOlJlcG9zaXRvcnkyMjMyOTQzMjA=. The query root is a special case however. Queries start by selecting fields on the query root, which is a singleton object for which we use the cache key QUERY_ROOT. The fact that queries start at a fixed root object is not an implementation detail of Apollo, but part of the GraphQL spec. But as soon as you encounter an object with an identity of its own, your cache key will no longer have that prefix.

We can't support .allMatchingKeyPattern("*search*"), because it isn't clear what that would mean. You need to make clear what object (or fields of that object) you are evicting. We have no way of efficiently selecting all records by the fields they contain, because records are only indexed by cache key. And even if we did find a way to do that, I suspect the semantics would be more confusing than helpful.

What you might want to say instead is something like store.evict("QUERY_ROOT", field: "search"), which would evict all results for the search field from the QUERY_ROOT record. Specifying QUERY_ROOT is necessary here, because your schema may have other types that have a search field defined. store.evict("Company:123", field: "search") would evict results for a search field defined on a Company type for example. (We could define a constant to avoid using a literal QUERY_ROOT here, or perhaps some kind of default, but you still need to be able to differentiate between evicting a root field or a field on another object.)

That doesn't mean there is no place for regexes. I can imagine it being useful to be able to do store.evict("User:*") for example, which would evict all users. Or even store.evict("QUERY_ROOT", field: "*search*Design*") if we are ok relying on how field keys are encoded.

For what it's worth, this is not set up as a regex under the hood - it's just dropping the asterisks out of the string. if we're intending for this to be regex-compatible there's a lot more work we need to do to make that happen.

With what's in here at the moment, *search*query:Design would just get translated to searchquery:Design, which I don't think would find anything.

That's only for InMemoryNormalizedCache. SQLiteNormalizedCache replaces the * with % to be able to use the LIKE operator

That seems like a fairly significant difference based on which cache you're using, which IMO seems like it could be really confusing for users (and would also make things like a hybrid in-memory and SQLite cache much more difficult to implement in the future)

That's a good point. This is my bad as a contributor, as I was focused primarily on SQLite and didn't invest a lot of time about fully considering InMemory.

The InMemory cache should probably actually use a regex to compare, but it sounds like we're moving towards explicit keys or at least a prefix.

designatednerd · 2020-12-10T00:05:11Z

After discussion between @martijnwalraven and myself, we've decided on a two-part approach for proceeding with cache clearing stuff:

Implement Add deleteObject(key) to ReadWriteTransaction #1555 to allow for deleting the object for a particular key. It won't be as flexible as this, but it does at least give some ability to get rid of things in the cache, which we do lack at the moment. This will be happening within the next few weeks.
Work on a longer-term solution to cache eviction and invalidation. On web, we've made some solid progress with this, but there's some significant changes we'd need to make to make that work on iOS. This is likely going to be part of our work in Q1 of 2021.

Unfortunately, this PR isn't quite what we're looking for longer-term, and it may make some of the things we want to do longer-term harder, so while we really appreciate the effort and your time bouncing ideas here, I'm gonna go ahead and close this one out.

Mordil force-pushed the cache-cleanup branch from 78326dc to 6335435 Compare November 12, 2020 09:46

Mordil mentioned this pull request Nov 12, 2020

CORE-6987 -- Cache Cleanup Policies peek-travel/apollo-ios#2

Merged

designatednerd reviewed Nov 12, 2020

View reviewed changes

martijnwalraven reviewed Nov 17, 2020

View reviewed changes

Mordil added 10 commits November 17, 2020 22:27

Cleanup 'GraphQLOperationType' provision of cache key

01c458f

Cleanup ApolloStore initializers

6a93e93

Add 'CacheClearingPolicy' for users to have more control over cleanup

33b66da

rename cacheKey to rootCacheKey

0deb473

match project code style

ceb4e13

Fix allMatchingKeyPattern cache clearing policy for in-memory cache

7e4feee

Move clearCache overload to 'ApolloClient'

af103d8

Use enum for 'CacheClearingPolicy'

7388cba

Remove 'first' and 'last' CacheClearingPolicy cases

436336c

Remove locks in InMemoryNormalizedCache

5470bb7

Mordil force-pushed the cache-cleanup branch from 3c4de5c to 5470bb7 Compare November 18, 2020 07:06

designatednerd added the caching label Nov 24, 2020

designatednerd mentioned this pull request Dec 9, 2020

Add cache fallback CachePolicy #1520

Closed

designatednerd closed this Dec 10, 2020

Mordil deleted the cache-cleanup branch January 26, 2023 00:33

Conversation

Mordil commented Nov 12, 2020

Uh oh!

designatednerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martijnwalraven Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

martijnwalraven Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mordil Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

martijnwalraven Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mordil Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

martijnwalraven Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

designatednerd Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

Mordil Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

designatednerd Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

Mordil Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

designatednerd commented Dec 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

martijnwalraven Nov 17, 2020 •

edited

Loading

martijnwalraven Nov 18, 2020 •

edited

Loading

martijnwalraven Nov 18, 2020 •

edited

Loading