Skip to content

Add 'CacheClearingPolicy' for more control over how the cache is cleared#1512

Closed
Mordil wants to merge 10 commits intoapollographql:mainfrom
peek-travel:cache-cleanup
Closed

Add 'CacheClearingPolicy' for more control over how the cache is cleared#1512
Mordil wants to merge 10 commits intoapollographql:mainfrom
peek-travel:cache-cleanup

Conversation

@Mordil
Copy link
Copy Markdown
Contributor

@Mordil Mordil commented Nov 12, 2020

Motivation:

It is common to want finer control over which records are removed from the cache without just deleting all records.

This modifies the framework to support different policies that a developer can specify when calling clear and clearImmediately.

Modifications:

  • Rename: shouldVacuumOnClear argument label in SQLiteNormalizedCache initializers to compactFileOnClear to make it clearer for those unfamiliar with SQLite's terminology
  • Add: CacheClearingPolicy with policies for removing the first/last K records, key pattern matching, and the current behavior of removing all records
  • Add: ApolloStore.clearCache overload that accepts a CacheClearingPolicy
  • Change: RecordSet.clear to accept a CacheClearingPolicy
  • Change: NormalizedCache.clear and NormalizedCache.clearImmediately to accept a CacheClearingPolicy
    • The current methods are available as overloads
  • Change: ApolloClientProtocol.clearCache to accept a CacheClearingPolicy
    • The current method is available as an overload

Result:

Developers can now have more control over which records to clear from their cache:

// current behavior
client.clearCache(usingPolicy: .allRecords)

// delete the first 10 records
client.clearCache(usingPolicy: .first(10))

// delete all hero records
client.clearCache(usingPolicy: .allMatchingKeyPattern("*hero*"))

Copy link
Copy Markdown
Contributor

@designatednerd designatednerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again definitely recommend talking through this with @martijnwalraven, but this seems like a reasonable complement to what you're working on in #1498

Comment thread Sources/Apollo/ApolloClientProtocol.swift Outdated
Comment thread Sources/Apollo/GraphQLOperation.swift Outdated
Comment thread Sources/Apollo/GraphQLOperation.swift
Comment thread Sources/Apollo/InMemoryNormalizedCache.swift Outdated
Comment thread Sources/Apollo/InMemoryNormalizedCache.swift Outdated
Comment thread Sources/Apollo/RecordSet.swift Outdated
Comment thread Sources/ApolloSQLite/SQLiteNormalizedCache.swift
Comment thread Sources/ApolloSQLite/SQLiteNormalizedCache.swift Outdated
Comment thread Sources/ApolloSQLite/SQLiteNormalizedCache.swift Outdated
Comment thread Tests/ApolloSQLiteTests/CachePersistenceTests.swift Outdated
Comment thread Sources/Apollo/NormalizedCache.swift Outdated
Comment thread Sources/Apollo/RecordSet.swift Outdated
/// Clears all records whose key matches the provided glob pattern.
///
/// For example `*pollo` will match both `Apollo` and `pollo`.
case allMatchingKeyPattern(String)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how useful a regex really is here, because real keys (based on id or other key fields) are usually pretty opaque, and we're actually planning to move away from using path-based keys as a fallback completely.

So it might make more sense to follow Apollo Client web here and just offer a way to clear/evict records by key, perhaps allowing multiple keys to be passed in.

Copy link
Copy Markdown
Contributor

@martijnwalraven martijnwalraven Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we get rid of path-based keys, maybe it would make sense to have evict remove all records that have the specified key as a prefix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that possible, but what if I want to evict all query responses? .allMatchingKeyPattern("*search*")

With a prefix that requires me knowing the implementation detail that Apollo uses the QUERY_ROOT prefix in order to evict query responses.

.anyKeyWithPrefix("QUERY_ROOT") is effectively the same as .allMatchingKeyPattern("QUERY_ROOT*")

Copy link
Copy Markdown
Contributor

@martijnwalraven martijnwalraven Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you expect .allMatchingKeyPattern("*search*") to do, and what is the use case for that?

We definitely have to do a better job explaining how the normalized cache works. My hope is that removing path-based keys will make things easier, but I also want to make sure our APIs don't complicate matters further.

For now, the problem is that .allMatchingKeyPattern("QUERY_ROOT*") likely won't do what people think it does. It would only remove records that do not have their own key configured (through cacheKeyForObject()). Those objects won't have an id or other key field that is used to uniquely identify them, so we currently use a fake path-based key, which means the results of different queries can't be merged.

Most of the time, that is not what you want, because that means missing out on all the benefits of normalization (while still paying the performance costs of storing objects as separate records). We will likely follow the example of Apollo Client web v3, and simply nest these "value objects" within the same record.

At the same time, we should make it easier for people to configure keys (likely with sensible defaults that concatenates __typename and id). What does your cacheKeyForObject() look like right now?

If your cacheKeyForObject() concatenates __typename and id, and the result of a query that requests user(id: 123) is stored in the cache, QUERY_ROOT would only contain a reference to the User:123 record. So you wouldn't remove that record by removing QUERY_ROOT*, just the mapping between user(id: 123) and that record.

I suspect the more common case would be to want to remove individual objects by key (so removing the user with id 123 for example). Ideally we'd find a way to expose this in a type safe way, but for now it may be ok to rely on manually constructing keys (after all, that's what cacheKeyForObject() also relies on). So you'd do a store.evict("User:123") for example. There may also be a need to remove individual fields, especially on the query root.

What scenarios do you see in your app? How fine grained are the cache eviction operations you have found a need for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our cacheKeyForObject is looking at just __typename and id, otherwise returning nil.

We don't have a particular use case for this clearing policy in question - we're actually after an age-based policy, but since that is some longer-term work, we have the policy implemented only in our fork for now.

These other policies were ones that I brainstormed in order to make this a valuable change to the codebase, rather than adding an enum with a single case.

If you think there's value in just making the change to add a single case (which is the default value used everywhere) as a foundation to build off of, and we later add policies on a singular case-by-case basis, then I'm happy to remove this policy.


My example is coming from the GitHub GraphQL API with a very minimal setup of Apollo that uses the same __typename and id cache key generation.

An example of a repository search would provide the following record:

_id key record
247393 QUERY_ROOT.search(after:Y3Vyc29yOjc2MA==,first:30,query:Design,type:REPOSITORY).edges.29 {"node":{"$reference":"Repository:MDEwOlJlcG9zaXRvcnkyMjMyOTQzMjA="},"__typename":"SearchResultItemEdge"}

I would expect *search* to remove any key that contains the phrase search. I wouldn't necessarily expect the referenced records to be deleted, just the query response record(s) themselves.

Doing it by prefix provides two issues:

  1. It forces me to understand that the default key prefix is QUERY_ROOT, which is an implementation detail that could, and you allude to it will, change
  2. If I want to delete all searches, I need to have the longer "QUERY_ROOT.search" as the value, which adds noise to the intent of the cache clear: I just want to clear up any key that is a search related record.

Taking away the situation where Apollo's defaults are used partially or wholly - having a prefix only policy pushes effort on the cache key algorithm the developers establishes to be mindful of possibly easier cleanup later because they would then have to specify the prefix in order to match when it's a glob pattern, they can be much more liberal with their key generation.

Copy link
Copy Markdown
Contributor

@martijnwalraven martijnwalraven Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what may be confusing about the current situation is that the path-based fallback blurs the distinction between object keys and field keys. An object key is the result of cacheKeyForObject, so something like User:123. A field key is generated by combining the field name with argument values.

Because we currently generate separate records for every level of a query result, regardless of whether an object has a real cache key, you end up with lots of useless records like QUERY_ROOT.search(after:Y3Vyc29yOjc2MA==,first:30,query:Design,type:REPOSITORY).edges.29.

Note that search is not the cache key of an object here, it is a field name. It's an implementation detail that we currently store every edge of a connection as a separate record, using the complete path as a key, and that is something we plan on fixing. But that's exactly the reason the API shouldn't rely on this.

The only real cache keys are things like User:123, and Repository:MDEwOlJlcG9zaXRvcnkyMjMyOTQzMjA=. The query root is a special case however. Queries start by selecting fields on the query root, which is a singleton object for which we use the cache key QUERY_ROOT. The fact that queries start at a fixed root object is not an implementation detail of Apollo, but part of the GraphQL spec. But as soon as you encounter an object with an identity of its own, your cache key will no longer have that prefix.

We can't support .allMatchingKeyPattern("*search*"), because it isn't clear what that would mean. You need to make clear what object (or fields of that object) you are evicting. We have no way of efficiently selecting all records by the fields they contain, because records are only indexed by cache key. And even if we did find a way to do that, I suspect the semantics would be more confusing than helpful.

What you might want to say instead is something like store.evict("QUERY_ROOT", field: "search"), which would evict all results for the search field from the QUERY_ROOT record. Specifying QUERY_ROOT is necessary here, because your schema may have other types that have a search field defined. store.evict("Company:123", field: "search") would evict results for a search field defined on a Company type for example. (We could define a constant to avoid using a literal QUERY_ROOT here, or perhaps some kind of default, but you still need to be able to differentiate between evicting a root field or a field on another object.)

That doesn't mean there is no place for regexes. I can imagine it being useful to be able to do store.evict("User:*") for example, which would evict all users. Or even store.evict("QUERY_ROOT", field: "*search*Design*") if we are ok relying on how field keys are encoded.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, this is not set up as a regex under the hood - it's just dropping the asterisks out of the string. if we're intending for this to be regex-compatible there's a lot more work we need to do to make that happen.

With what's in here at the moment, *search*query:Design would just get translated to searchquery:Design, which I don't think would find anything.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only for InMemoryNormalizedCache. SQLiteNormalizedCache replaces the * with % to be able to use the LIKE operator

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a fairly significant difference based on which cache you're using, which IMO seems like it could be really confusing for users (and would also make things like a hybrid in-memory and SQLite cache much more difficult to implement in the future)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. This is my bad as a contributor, as I was focused primarily on SQLite and didn't invest a lot of time about fully considering InMemory.

The InMemory cache should probably actually use a regex to compare, but it sounds like we're moving towards explicit keys or at least a prefix.

@designatednerd
Copy link
Copy Markdown
Contributor

After discussion between @martijnwalraven and myself, we've decided on a two-part approach for proceeding with cache clearing stuff:

  1. Implement Add deleteObject(key) to ReadWriteTransaction #1555 to allow for deleting the object for a particular key. It won't be as flexible as this, but it does at least give some ability to get rid of things in the cache, which we do lack at the moment. This will be happening within the next few weeks.
  2. Work on a longer-term solution to cache eviction and invalidation. On web, we've made some solid progress with this, but there's some significant changes we'd need to make to make that work on iOS. This is likely going to be part of our work in Q1 of 2021.

Unfortunately, this PR isn't quite what we're looking for longer-term, and it may make some of the things we want to do longer-term harder, so while we really appreciate the effort and your time bouncing ideas here, I'm gonna go ahead and close this one out.

@Mordil Mordil deleted the cache-cleanup branch January 26, 2023 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants