Speed up MAS token introspection#18357
Conversation
| // TODO: Is it safe to assert unwind safety here? I think so, as we | ||
| // don't use anything that could be tainted by the panic afterwards. | ||
| // Note that `.spawn(..)` asserts unwind safety on the future too. | ||
| let res = AssertUnwindSafe(fut).catch_unwind().await; |
There was a problem hiding this comment.
alternatively, spawn the future on the runtime, await on the handle, and it will give you an Err with the panic in case it panics
There was a problem hiding this comment.
We could, though then you end up spawning two tasks per function, rather than one. Probably not a huge deal, but feels a bit bleurgh
There was a problem hiding this comment.
Spawning is cheap, let's do that instead please. Also we'll need to spawn a separate task anyway if we want to properly support cancel
There was a problem hiding this comment.
Can you expand on why you want to use new tasks please? I don't see the benefit of spawning a new task to just wait on it, semantically you end up with a bunch of tasks with different IDs all for the same work. In future, if we wanted to start tracking tasks and e.g. their resource usage then using multiple tasks makes that more complicated.
I also don't think we need a separate task for cancellation necessarily. You can change this line to do a select on both fut and the cancellation future.
There was a problem hiding this comment.
I'm not really confortable with AssertUnwindSafe being used so broadly. Tasks are cheap to spawn, and I don't think we'd want to base our potential resource consumption measurement in Rust-world to Tokio task IDs?
Anyway, even though AssertUnwindSafe smells like a bad thing waiting to happen, I won't block this PR further because of this if you're not convinced that spawning is fine
There was a problem hiding this comment.
I'm not really confortable with
AssertUnwindSafebeing used so broadly. Tasks are cheap to spawn, and I don't think we'd want to base our potential resource consumption measurement in Rust-world to Tokio task IDs?
I think we would do this at the task level, we'd have a task-local context that records resource usage, so you could e.g. wrap the top-level future that records the resource consumption of poll, or have DB functions record transaction times/etc. When spawning new tasks you'd want to decide if the resources of the task get allocated to the current task or to a new one.
Anyway, even though AssertUnwindSafe smells like a bad thing waiting to happen, I won't block this PR further because of this if you're not convinced that spawning is fine
Bear in mind that this is exactly what spawning a task does in tokio, so its hard to see how it would be fine for that and not here.
| static ref DEFERRED_CLASS: PyObject = { | ||
| Python::with_gil(|py| { | ||
| py.import("twisted.internet.defer") | ||
| .expect("module 'twisted.internet.defer' should be importable") |
There was a problem hiding this comment.
I'm not a fan of panicking like that, not sure what will happen if it's the case
There was a problem hiding this comment.
This will cause a panic the first and subsequent derefs. Given that this shouldn't ever fail I prefer having the initialisation closer to the definition for clarities sake.
I've added an explicit call to these functions in the init of HttpClient so that if it ever did fail, it'd fail at startup.
This is because openssl cannot be used in a manylinux context due to lack of stable ABI.
| // Make sure we fail early if we can't build the lazy statics. | ||
| LazyLock::force(&RUNTIME); | ||
| LazyLock::force(&DEFERRED_CLASS); |
There was a problem hiding this comment.
Could be done in the module initialisation?
There was a problem hiding this comment.
Annoyingly, we can't import twisted reactor at this stage as it happens too early. See comment I left in HttpClient::new
| let mut stream = response.bytes_stream(); | ||
| let mut buffer = Vec::new(); | ||
| while let Some(chunk) = stream.try_next().await.context("reading body")? { | ||
| if buffer.len() + chunk.len() > response_limit { | ||
| Err(anyhow::anyhow!("Response size too large"))?; | ||
| } | ||
|
|
||
| buffer.extend_from_slice(&chunk); | ||
| } |
There was a problem hiding this comment.
I believe you can achieve the same with http_body_util::Limited; reqwest::Response implements Into<Body>
There was a problem hiding this comment.
Yeah, originally tried that but it messed up the errors (one of the exceptions stops implementing std Error). Given how straight forwards this is it felt easier than faffing with error types.
| // TODO: Is it safe to assert unwind safety here? I think so, as we | ||
| // don't use anything that could be tainted by the panic afterwards. | ||
| // Note that `.spawn(..)` asserts unwind safety on the future too. | ||
| let res = AssertUnwindSafe(fut).catch_unwind().await; |
There was a problem hiding this comment.
Spawning is cheap, let's do that instead please. Also we'll need to spawn a separate task anyway if we want to properly support cancel
| /// The tokio runtime that we're using to run async Rust libs. | ||
| static RUNTIME: LazyLock<Runtime> = LazyLock::new(|| { | ||
| tokio::runtime::Builder::new_multi_thread() | ||
| .worker_threads(4) |
There was a problem hiding this comment.
We'll likely want to have that configurable at some point, but this is probably a sane default.
|
Friendly ping @sandhose |
| // TODO: Is it safe to assert unwind safety here? I think so, as we | ||
| // don't use anything that could be tainted by the panic afterwards. | ||
| // Note that `.spawn(..)` asserts unwind safety on the future too. | ||
| let res = AssertUnwindSafe(fut).catch_unwind().await; |
There was a problem hiding this comment.
I'm not really confortable with AssertUnwindSafe being used so broadly. Tasks are cheap to spawn, and I don't think we'd want to base our potential resource consumption measurement in Rust-world to Tokio task IDs?
Anyway, even though AssertUnwindSafe smells like a bad thing waiting to happen, I won't block this PR further because of this if you're not convinced that spawning is fine
| with PreserveLoggingContext(): | ||
| resp_body = await self._rust_http_client.post( | ||
| url=uri, | ||
| response_limit=1 * 1024 * 1024, | ||
| headers=raw_headers, | ||
| request_body=body, | ||
| ) |
There was a problem hiding this comment.
What's the reasoning behind using PreserveLoggingContext() here? Why do we want to reset the LoggingContext to the SENTINEL_CONTEXT during the operation and then restore the old context?
As far as I can tell we're not doing any sort of fire-and-forget here where this would matter
Perhaps, it's because the way the Rust HTTP client handles the deferreds? In any case, it seems like we should have some wrapper around it that uses make_deferred_yieldable(...) to make things right so we don't have to do this in the downstream code.
There was a problem hiding this comment.
Perhaps, it's because the way the Rust HTTP client handles the deferreds? In any case, it seems like we should have some wrapper around it that uses
make_deferred_yieldable(...)to make things right so we don't have to do this in the downstream code.
The returned deferred does not follow the log context rules, so we need to make it follow the rules. The make_deferred_yieldable(...) function is a way of doing so, but it is equivalent to using with PreserveLoggingContext():, i.e. it clears the logcontext before awaiting (and so before execution passes back to the reactor) and restores the old context once the awaitable completes (execution passes from the reactor back to the code).
There was a problem hiding this comment.
it seems like we should have some wrapper around it that uses
make_deferred_yieldable(...)to make things right so we don't have to do this in the downstream code.
Addressing this in #18903
So downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable` Spawning from #18870 and #18357 (comment)
So downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable` Spawning from #18870 and #18357 (comment)
Wrap the Rust HTTP client with `make_deferred_yieldable` so downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable`. > it seems like we should have some wrapper around it that uses [`make_deferred_yieldable(...)`](https://github.com/element-hq/synapse/blob/40edb10a98ae24c637b7a9cf6a3003bf6fa48b5f/docs/log_contexts.md#where-you-create-a-new-awaitable-make-it-follow-the-rules) to make things right so we don't have to do this in the downstream code. > > *-- @MadLittleMods, #18357 (comment) Spawning from wanting to [remove `PreserveLoggingContext()` from the codebase](#18870) and thinking that we [shouldn't have to pollute all downstream usage with `PreserveLoggingContext()` or `make_deferred_yieldable`](#18357 (comment)) Part of #18905 (Remove `sentinel` logcontext where we log in Synapse)
*As suggested by @sandhose in #19498 (comment) Simplify Rust HTTP client response streaming and limiting ### Dev notes Synapse's Rust HTTP client was introduced in #18357 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
We do this by shoving it into Rust. We believe our python http client is a bit slow.
Also bumps minimum rust version to 1.81.0, released last September (over six months ago)
To allow for async Rust, includes some adapters between Tokio in Rust and the Twisted reactor in Python.