Replace Address.hashCache Guava LoadingCache with Caffeine#10235
Open
diega wants to merge 1 commit intobesu-eth:mainfrom
Open
Replace Address.hashCache Guava LoadingCache with Caffeine#10235diega wants to merge 1 commit intobesu-eth:mainfrom
diega wants to merge 1 commit intobesu-eth:mainfrom
Conversation
Under heavy miss rate (pre-EIP-150 DoS-era blocks spam BALANCE/EXTCODESIZE against tens of thousands of pseudo-random addresses per tx) Guava's per-segment ReentrantLock serialises parallel tx executors on every account-touching EVM opcode. A thread dump of a stuck import thread on a Bonsai full-sync showed the thread parked on LocalCache$Segment.storeLoadedValue. Caffeine's load path is CAS-based (no segment write lock) and already the in-house cache library used elsewhere in Besu. Signed-off-by: Diego López León <dieguitoll@gmail.com>
f7ce64f to
7645f0f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR description
Address.hashCacheis a GuavaLoadingCache<Address, Hash>(capped at 4 000 entries) that memoises the keccak-256 of every address. Under heavy miss rate, Guava's per-segmentReentrantLockserialises the parallel tx executors on every account-touching EVM opcode. The pre-EIP-150 DoS-era blocks (~2 283 416 to 2 700 000) are the pathological case: the attack contracts loopBALANCE/EXTCODESIZEagainst tens of thousands of pseudo-random addresses per transaction. On a Bonsai full-sync through that range, import stalls; akill -3thread dump of the stuck import thread shows it parked onLocalCache$Segment.storeLoadedValuewhile the segment write lock is held.Forest uses the same
Address.hashCachebut never triggered this: parallel tx processing (--bonsai-parallel-tx-processing-enabled) is Bonsai-only. With sequential tx execution only one thread is callingaddressHashat a time, so Guava's segment locks are uncontended and the cache behaves fine. The concurrency required to expose the lock arrives with the parallel block processor.The fix swaps Guava for Caffeine, keeping the same
maximumSize(4_000). Caffeine's load path uses CAS-based bookkeeping with no segment write lock, which is exactly what is needed on this hot path. Caffeine is already the in-house cache library used across Besu (MemoryBoundCache,PathBasedCachedWorldStorageManager, all*PrecompiledContractresult caches,JumpDestOnlyCodeCache) so this aligns with existing practice. Keeping the cap unchanged means memory footprint is identical to today's behaviour (see below).EVM opcodes that hit this path:
BALANCE,EXTCODESIZE,EXTCODECOPY,EXTCODEHASH,CALL/CALLCODE/DELEGATECALL/STATICCALL(target),SELFDESTRUCT(beneficiary),CREATE/CREATE2(newly derived address).Thread dump excerpt (import thread parked on Guava segment lock)
Microbenchmark
10 worker threads run two workloads against the cache:
BALANCE/EXTCODESIZEtargets a fresh pseudo-random address. Almost every call is a cache miss that has to invoke the loader.Typical laptop, ops/s aggregated across the 10 threads:
At the same 4k cap Caffeine is ~1.7× on hot and ~2.5× on miss, so the improvement is attributable to the library change (lock-free load path) rather than to a larger cache. For reference, Guava at
maximumSize(1_000_000)only reaches ~535 000 ops/s on dos-miss, still well below Caffeine at the original 4k cap (~994 000), confirming the cap is not the variable that matters here.Optional follow-up: raising the cap
If a larger cap is judged worth the memory, throughput scales further. Filling each cache to its cap and measuring retained heap after GC:
The 290 MB ceiling is only approached under a sustained address-diverse workload (e.g. a full-sync through pre-EIP-150 DoS blocks) and is permanently bounded by the cap. This PR keeps the historical 4k so the memory profile stays identical to today; bumping the cap is a one-line follow-up.
End-to-end expectation
Measured directly with Caffeine at
maximumSize(1_000_000)on a full-sync through pre-EIP-150 blocks: per-block import time went from ~25 s to ~1.8 s (~14×, parallel tx processing on). That measurement is from the larger-cap variant, not the 4k cap this PR ships, but the dominant cost being eliminated is theLockSupport.park()stall on the segment lock, which goes away purely from switching libraries, regardless of cap. The throughput delta between Caffeine 4k and 1M in the bench is much smaller (~30 %) than the delta between Caffeine and Guava (~2.5×), so the 4k variant should retain most of the end-to-end improvement; back-of-envelope extrapolation puts it at ~10× (~25 s to ~2-3 s per block). Not directly measured.Bench source (click to expand)
Drop under
datatypes/src/test/java/org/hyperledger/besu/datatypes/and run with./gradlew :datatypes:test --tests AddressHashCacheBench.Honest caveat
A bare
ConcurrentHashMapwas measurably faster than Caffeine in this specific bench (~12M / ~2M ops/s for hot / miss) becauseKeccak-256(20 bytes)is cheap enough (~500 ns native) that Caffeine's TinyLFU bookkeeping exceeds the value of its eviction policy. A CHM with a size-boundedclear()was the fastest bounded option tested. Caffeine was preferred here for consistency with the rest of the codebase and to sidestep ad-hoc eviction logic; happy to swap if reviewers would rather optimize for raw throughput.Fixed Issue(s)
n/a
Thanks for sending a pull request! Have you done the following?
doc-change-requiredlabel to this PR if updates are required.Locally, you can run these tests to catch failures early:
./gradlew spotlessApply./gradlew build./gradlew acceptanceTest./gradlew integrationTest./gradlew ethereum:referenceTests:referenceTests