Skip to content

Initial persistent caching prototype#967

Merged
ibraheemdev merged 9 commits into
salsa-rs:masterfrom
ibraheemdev:ibraheem/persistent-caching
Aug 11, 2025
Merged

Initial persistent caching prototype#967
ibraheemdev merged 9 commits into
salsa-rs:masterfrom
ibraheemdev:ibraheem/persistent-caching

Conversation

@ibraheemdev

@ibraheemdev ibraheemdev commented Aug 8, 2025

Copy link
Copy Markdown
Member

The overall design is laid out in this document. All the new functionality is gated under the persistence feature. However, this PR has a couple of limitations:

  • All transitive dependencies are forced to be serializable.
  • There is no easy way to update inputs after they are deserialized.
  • Query hashing is not implemented.
  • Accumulators and supertypes are not serializable.

@netlify

netlify Bot commented Aug 8, 2025

Copy link
Copy Markdown

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 4103798
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/6896b522cda9d500081f1913

@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch from 8931c07 to caf7925 Compare August 8, 2025 03:43
deserialize_fn: Some(deserialize_fn),
}) => Some(quote! { (#serialize_fn, #deserialize_fn) }),
// Fallback to the `{Serialize, Deserialize}` implementations tuples if only one of the
// two is provided.

@ibraheemdev ibraheemdev Aug 8, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a little bit of weirdness here in that all of the following are accepted:

#[salsa::tracked(serialize)]
#[salsa::tracked(deserialize)]
#[salsa::tracked(serialize, deserialize)]
#[salsa::tracked(serialize = ...)]
#[salsa::tracked(deserialize = ...)]
#[salsa::tracked(serialize = ..., deserialize = ...)]
struct ...;

But I'm not sure disallowing any of those is better. With the derive-macro API, we should be able to get rid of this anyways and just use serde::{Serialize, Deserialize} directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this into a persist attribute to make things more consistent:

#[salsa::tracked(persist)]
#[salsa::tracked(persist(serialize = ...))]
#[salsa::tracked(persist(deserialize = ...))]
#[salsa::tracked(persist(serialize = ..., deserialize = ...))]
struct ...;

We don't accept the serialize and deserialize arguments on tracked functions because both the input and output types need to implement serde::{Serialize, Deserialize}, and the arguments would be ambiguous.

@codspeed-hq

codspeed-hq Bot commented Aug 8, 2025

Copy link
Copy Markdown

CodSpeed Performance Report

Merging #967 will improve performances by 8.88%

Comparing ibraheemdev:ibraheem/persistent-caching (4103798) with master (940f9c0)

Summary

⚡ 3 improvements
✅ 9 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
amortized[Input] 2.3 µs 2.1 µs +7.39%
amortized[InternedInput] 2.3 µs 2.1 µs +8.88%
new[InternedInput] 4.2 µs 4 µs +5.43%

@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch 5 times, most recently from 77b1f21 to 15a3110 Compare August 8, 2025 04:08
Comment thread src/database.rs Outdated
let DeserializeIngredients(zalsa) = self;

while let Some(name) = access.next_key::<&str>()? {
// TODO: Serialize the ingredient index directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the debug name is not a unique identifier, but I found it more helpful for debugging than the ingredient index. It could also just be a field on each ingredient.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate (maybe even optinal) field would make sense to me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the debug name for now, it's unfortunately very annoying to add extra fields because serde's macros aren't compatible with DeserializeSeed (which we need for some fields, because we deserialize into the table directly).

@MichaReiser MichaReiser left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I didn't read line by line and instead focused on the overall design.

What's your plan with this PR? Do you want to land it behind a persistence feature or do you want to keep iterating in a separate branch?

Comment thread components/salsa-macro-rules/src/setup_input_struct.rs Outdated
Comment thread src/cycle.rs Outdated
Comment thread src/database.rs Outdated
Comment on lines +205 to +210
// Ensure structs are serialized before tracked functions, as deserializing a
// memo requires its input struct to have been deserialized.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine deferring this to later but that will mean that we always serialize all tracked structs even if some query instances are never serialized (that created those tracked structs)

Comment thread src/database.rs
Comment thread src/database.rs Outdated
let DeserializeIngredients(zalsa) = self;

while let Some(name) = access.next_key::<&str>()? {
// TODO: Serialize the ingredient index directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate (maybe even optinal) field would make sense to me.

Comment thread src/function/memo.rs
Comment thread src/input.rs Outdated
Comment thread src/interned.rs
Comment thread src/table.rs
Comment thread tests/serialize.rs
Comment on lines +324 to +327
<dyn salsa::Database>::deserialize(
&mut db,
&mut serde_json::Deserializer::from_str(&serialized),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does deserialize bump the revision to 2 to force a validation of all queries?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm no, should it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few things to consider here

  • memos and other ingredients that weren't verified in the last revision (current when deserializing) still need to be dirty. The easiest here might be to only serialize ingredients that are verified in the current revision and we revisit this later
  • we need to bump the revision if any input changes during deserialization. Now, we don't support this yet so it's fine to ignore for now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memos and other ingredients that weren't verified in the last revision (current when deserializing) still need to be dirty. The easiest here might be to only serialize ingredients that are verified in the current revision and we revisit this later

We serialize and deserialize the current revision of the database, so I don't think this is a problem. Any ingredients that weren't verified in the last revision will have to be verified after deserialization.

@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch from d30ee9d to 5e1a984 Compare August 8, 2025 21:06
@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch from 7c469e0 to 03377ef Compare August 8, 2025 21:15
@ibraheemdev

Copy link
Copy Markdown
Member Author

I gated everything under a persistence feature. We should rework CI to use something like cargo-hack to test all the new feature permutations, but I'll do that in a separate PR.

@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch from 206d738 to 76a794b Compare August 8, 2025 22:35
@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch from 76a794b to b7e5658 Compare August 8, 2025 22:36
@ibraheemdev ibraheemdev force-pushed the ibraheem/persistent-caching branch from fecc37d to 4103798 Compare August 9, 2025 02:40
@ibraheemdev ibraheemdev added this pull request to the merge queue Aug 11, 2025
Merged via the queue into salsa-rs:master with commit 1ffb32f Aug 11, 2025
12 checks passed
@ibraheemdev ibraheemdev deleted the ibraheem/persistent-caching branch August 11, 2025 19:12
@github-actions github-actions Bot mentioned this pull request Aug 8, 2025
@github-actions github-actions Bot mentioned this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants