Add 'CacheClearingPolicy' for more control over how the cache is cleared#1512
Add 'CacheClearingPolicy' for more control over how the cache is cleared#1512Mordil wants to merge 10 commits intoapollographql:mainfrom
Conversation
78326dc to
6335435
Compare
designatednerd
left a comment
There was a problem hiding this comment.
Again definitely recommend talking through this with @martijnwalraven, but this seems like a reasonable complement to what you're working on in #1498
| /// Clears all records whose key matches the provided glob pattern. | ||
| /// | ||
| /// For example `*pollo` will match both `Apollo` and `pollo`. | ||
| case allMatchingKeyPattern(String) |
There was a problem hiding this comment.
I wonder how useful a regex really is here, because real keys (based on id or other key fields) are usually pretty opaque, and we're actually planning to move away from using path-based keys as a fallback completely.
So it might make more sense to follow Apollo Client web here and just offer a way to clear/evict records by key, perhaps allowing multiple keys to be passed in.
There was a problem hiding this comment.
Until we get rid of path-based keys, maybe it would make sense to have evict remove all records that have the specified key as a prefix?
There was a problem hiding this comment.
I can see that possible, but what if I want to evict all query responses? .allMatchingKeyPattern("*search*")
With a prefix that requires me knowing the implementation detail that Apollo uses the QUERY_ROOT prefix in order to evict query responses.
.anyKeyWithPrefix("QUERY_ROOT") is effectively the same as .allMatchingKeyPattern("QUERY_ROOT*")
There was a problem hiding this comment.
What would you expect .allMatchingKeyPattern("*search*") to do, and what is the use case for that?
We definitely have to do a better job explaining how the normalized cache works. My hope is that removing path-based keys will make things easier, but I also want to make sure our APIs don't complicate matters further.
For now, the problem is that .allMatchingKeyPattern("QUERY_ROOT*") likely won't do what people think it does. It would only remove records that do not have their own key configured (through cacheKeyForObject()). Those objects won't have an id or other key field that is used to uniquely identify them, so we currently use a fake path-based key, which means the results of different queries can't be merged.
Most of the time, that is not what you want, because that means missing out on all the benefits of normalization (while still paying the performance costs of storing objects as separate records). We will likely follow the example of Apollo Client web v3, and simply nest these "value objects" within the same record.
At the same time, we should make it easier for people to configure keys (likely with sensible defaults that concatenates __typename and id). What does your cacheKeyForObject() look like right now?
If your cacheKeyForObject() concatenates __typename and id, and the result of a query that requests user(id: 123) is stored in the cache, QUERY_ROOT would only contain a reference to the User:123 record. So you wouldn't remove that record by removing QUERY_ROOT*, just the mapping between user(id: 123) and that record.
I suspect the more common case would be to want to remove individual objects by key (so removing the user with id 123 for example). Ideally we'd find a way to expose this in a type safe way, but for now it may be ok to rely on manually constructing keys (after all, that's what cacheKeyForObject() also relies on). So you'd do a store.evict("User:123") for example. There may also be a need to remove individual fields, especially on the query root.
What scenarios do you see in your app? How fine grained are the cache eviction operations you have found a need for?
There was a problem hiding this comment.
Our cacheKeyForObject is looking at just __typename and id, otherwise returning nil.
We don't have a particular use case for this clearing policy in question - we're actually after an age-based policy, but since that is some longer-term work, we have the policy implemented only in our fork for now.
These other policies were ones that I brainstormed in order to make this a valuable change to the codebase, rather than adding an enum with a single case.
If you think there's value in just making the change to add a single case (which is the default value used everywhere) as a foundation to build off of, and we later add policies on a singular case-by-case basis, then I'm happy to remove this policy.
My example is coming from the GitHub GraphQL API with a very minimal setup of Apollo that uses the same __typename and id cache key generation.
An example of a repository search would provide the following record:
| _id | key | record |
|---|---|---|
| 247393 | QUERY_ROOT.search(after:Y3Vyc29yOjc2MA==,first:30,query:Design,type:REPOSITORY).edges.29 |
{"node":{"$reference":"Repository:MDEwOlJlcG9zaXRvcnkyMjMyOTQzMjA="},"__typename":"SearchResultItemEdge"} |
I would expect *search* to remove any key that contains the phrase search. I wouldn't necessarily expect the referenced records to be deleted, just the query response record(s) themselves.
Doing it by prefix provides two issues:
- It forces me to understand that the default key prefix is
QUERY_ROOT, which is an implementation detail that could, and you allude to it will, change - If I want to delete all searches, I need to have the longer
"QUERY_ROOT.search"as the value, which adds noise to the intent of the cache clear: I just want to clear up any key that is asearchrelated record.
Taking away the situation where Apollo's defaults are used partially or wholly - having a prefix only policy pushes effort on the cache key algorithm the developers establishes to be mindful of possibly easier cleanup later because they would then have to specify the prefix in order to match when it's a glob pattern, they can be much more liberal with their key generation.
There was a problem hiding this comment.
I think what may be confusing about the current situation is that the path-based fallback blurs the distinction between object keys and field keys. An object key is the result of cacheKeyForObject, so something like User:123. A field key is generated by combining the field name with argument values.
Because we currently generate separate records for every level of a query result, regardless of whether an object has a real cache key, you end up with lots of useless records like QUERY_ROOT.search(after:Y3Vyc29yOjc2MA==,first:30,query:Design,type:REPOSITORY).edges.29.
Note that search is not the cache key of an object here, it is a field name. It's an implementation detail that we currently store every edge of a connection as a separate record, using the complete path as a key, and that is something we plan on fixing. But that's exactly the reason the API shouldn't rely on this.
The only real cache keys are things like User:123, and Repository:MDEwOlJlcG9zaXRvcnkyMjMyOTQzMjA=. The query root is a special case however. Queries start by selecting fields on the query root, which is a singleton object for which we use the cache key QUERY_ROOT. The fact that queries start at a fixed root object is not an implementation detail of Apollo, but part of the GraphQL spec. But as soon as you encounter an object with an identity of its own, your cache key will no longer have that prefix.
We can't support .allMatchingKeyPattern("*search*"), because it isn't clear what that would mean. You need to make clear what object (or fields of that object) you are evicting. We have no way of efficiently selecting all records by the fields they contain, because records are only indexed by cache key. And even if we did find a way to do that, I suspect the semantics would be more confusing than helpful.
What you might want to say instead is something like store.evict("QUERY_ROOT", field: "search"), which would evict all results for the search field from the QUERY_ROOT record. Specifying QUERY_ROOT is necessary here, because your schema may have other types that have a search field defined. store.evict("Company:123", field: "search") would evict results for a search field defined on a Company type for example. (We could define a constant to avoid using a literal QUERY_ROOT here, or perhaps some kind of default, but you still need to be able to differentiate between evicting a root field or a field on another object.)
That doesn't mean there is no place for regexes. I can imagine it being useful to be able to do store.evict("User:*") for example, which would evict all users. Or even store.evict("QUERY_ROOT", field: "*search*Design*") if we are ok relying on how field keys are encoded.
There was a problem hiding this comment.
For what it's worth, this is not set up as a regex under the hood - it's just dropping the asterisks out of the string. if we're intending for this to be regex-compatible there's a lot more work we need to do to make that happen.
With what's in here at the moment, *search*query:Design would just get translated to searchquery:Design, which I don't think would find anything.
There was a problem hiding this comment.
That's only for InMemoryNormalizedCache. SQLiteNormalizedCache replaces the * with % to be able to use the LIKE operator
There was a problem hiding this comment.
That seems like a fairly significant difference based on which cache you're using, which IMO seems like it could be really confusing for users (and would also make things like a hybrid in-memory and SQLite cache much more difficult to implement in the future)
There was a problem hiding this comment.
That's a good point. This is my bad as a contributor, as I was focused primarily on SQLite and didn't invest a lot of time about fully considering InMemory.
The InMemory cache should probably actually use a regex to compare, but it sounds like we're moving towards explicit keys or at least a prefix.
3c4de5c to
5470bb7
Compare
|
After discussion between @martijnwalraven and myself, we've decided on a two-part approach for proceeding with cache clearing stuff:
Unfortunately, this PR isn't quite what we're looking for longer-term, and it may make some of the things we want to do longer-term harder, so while we really appreciate the effort and your time bouncing ideas here, I'm gonna go ahead and close this one out. |
Motivation:
It is common to want finer control over which records are removed from the cache without just deleting all records.
This modifies the framework to support different policies that a developer can specify when calling
clearandclearImmediately.Modifications:
shouldVacuumOnClearargument label inSQLiteNormalizedCacheinitializers tocompactFileOnClearto make it clearer for those unfamiliar with SQLite's terminologyCacheClearingPolicywith policies for removing the first/last K records, key pattern matching, and the current behavior of removing all recordsApolloStore.clearCacheoverload that accepts aCacheClearingPolicyRecordSet.clearto accept aCacheClearingPolicyNormalizedCache.clearandNormalizedCache.clearImmediatelyto accept aCacheClearingPolicyApolloClientProtocol.clearCacheto accept aCacheClearingPolicyResult:
Developers can now have more control over which records to clear from their cache: