Use clz32 for counting trailing zeroes.#9340
Conversation
e3b7259 to
a788031
Compare
|
ooh nice, I'll run a full end-to-end benchmark but looks promising. I had considered this but I guess I assumed that the wasm instruction would be faster since ideally it would be a single CPU instruction. Maybe there's some overhead though if the engine doesn't inline it. |
|
Yeah, based on the numbers I’ve been seeing I think the current implementations never end up inlining JS->Wasm calls and instead even calls to simple functions end up allocating an execution context, so for single word computations it rarely is worth it, especially in this case as the JS implementation should end up being JIT compiled in the asmjs-like path on V8 (only arithmetic and no compute + assign operators like This is my first PR to Parcel btw (about time..), what’s up with the integration tests? Is it expected that they’re failing? |
|
|
|
Not sure how to interpret the benchmark results as I'm not familiar with how much variance is expected 🤔 Another question: what's the desired process from here on out, i.e. should I keep the branch up-to-date until reviewed or wait for review and rebase once the changes are up-to-par? :) |
|
Don't worry about the CI benchmark action, the actual numbers aren't really reliable. No need to keep this branch updated/rebased, this PR's commit history will get squashed on merge anyway, so we'll handle that. |
devongovett
left a comment
There was a problem hiding this comment.
Tested on the React Spectrum website. I didn't see a meaningful overall build time change on that project, but this is definitely easier to read. I think it just isn't the main bottleneck anymore. Could have a bigger effect on an even larger project.
This changes the implementation of
BitSetto useMath.clz32to count trailing zeroes instead of the current WebAssembly-based approach. This makes the implementation IMO slightly more readable to those of us without a brain to parse hex-encoded WebAssembly. 😄Additionally this approach should yield slightly better performance for the iteration of the
BitSets. At least that's what running the following micro-benchmark on my machine indicates:Benchmark Source Code
Giving the output: