Skip to content

Replace Address.hashCache Guava LoadingCache with Caffeine#10235

Open
diega wants to merge 1 commit intobesu-eth:mainfrom
diega:fix/address-hash-cache-contention
Open

Replace Address.hashCache Guava LoadingCache with Caffeine#10235
diega wants to merge 1 commit intobesu-eth:mainfrom
diega:fix/address-hash-cache-contention

Conversation

@diega
Copy link
Copy Markdown
Contributor

@diega diega commented Apr 14, 2026

PR description

Address.hashCache is a Guava LoadingCache<Address, Hash> (capped at 4 000 entries) that memoises the keccak-256 of every address. Under heavy miss rate, Guava's per-segment ReentrantLock serialises the parallel tx executors on every account-touching EVM opcode. The pre-EIP-150 DoS-era blocks (~2 283 416 to 2 700 000) are the pathological case: the attack contracts loop BALANCE / EXTCODESIZE against tens of thousands of pseudo-random addresses per transaction. On a Bonsai full-sync through that range, import stalls; a kill -3 thread dump of the stuck import thread shows it parked on LocalCache$Segment.storeLoadedValue while the segment write lock is held.

Forest uses the same Address.hashCache but never triggered this: parallel tx processing (--bonsai-parallel-tx-processing-enabled) is Bonsai-only. With sequential tx execution only one thread is calling addressHash at a time, so Guava's segment locks are uncontended and the cache behaves fine. The concurrency required to expose the lock arrives with the parallel block processor.

The fix swaps Guava for Caffeine, keeping the same maximumSize(4_000). Caffeine's load path uses CAS-based bookkeeping with no segment write lock, which is exactly what is needed on this hot path. Caffeine is already the in-house cache library used across Besu (MemoryBoundCache, PathBasedCachedWorldStorageManager, all *PrecompiledContract result caches, JumpDestOnlyCodeCache) so this aligns with existing practice. Keeping the cap unchanged means memory footprint is identical to today's behaviour (see below).

EVM opcodes that hit this path: BALANCE, EXTCODESIZE, EXTCODECOPY, EXTCODEHASH, CALL / CALLCODE / DELEGATECALL / STATICCALL (target), SELFDESTRUCT (beneficiary), CREATE / CREATE2 (newly derived address).

Thread dump excerpt (import thread parked on Guava segment lock)

"EthScheduler-Services-6 (importBlock)" WAITING (parking)
    at jdk.internal.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2113)
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4006)
    at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4946)
    at org.hyperledger.besu.datatypes.Address.addressHash(Address.java:248)
    at o.h.b.e.t.pathbased.bonsai.worldview.BonsaiWorldState.get(...)
    at o.h.b.e.t.pathbased.common.worldview.accumulator.PathBasedWorldStateUpdateAccumulator.loadAccount(...)
    at o.h.b.e.t.pathbased.common.worldview.accumulator.PathBasedWorldStateUpdateAccumulator.getForMutation(...)
    at o.h.b.evm.worldstate.AbstractWorldUpdater.get(AbstractWorldUpdater.java:102)
    at o.h.b.evm.operation.AbstractOperation.getAccount(AbstractOperation.java:101)

Microbenchmark

10 worker threads run two workloads against the cache:

  • hot-set: 32 distinct addresses repeatedly hashed. Models normal block execution (sender, recipient, coinbase, popular contracts) where the same addresses are touched many times. Almost every call is a cache hit.
  • dos-miss: 50 000 distinct addresses per thread, each hashed exactly once. Models the pre-EIP-150 DoS pattern where every BALANCE / EXTCODESIZE targets a fresh pseudo-random address. Almost every call is a cache miss that has to invoke the loader.

Typical laptop, ops/s aggregated across the 10 threads:

Impl hot-set throughput dos-miss throughput
Guava LoadingCache (4k) 2 600 000 393 000
Caffeine (4k) 4 400 000 994 000

At the same 4k cap Caffeine is ~1.7× on hot and ~2.5× on miss, so the improvement is attributable to the library change (lock-free load path) rather than to a larger cache. For reference, Guava at maximumSize(1_000_000) only reaches ~535 000 ops/s on dos-miss, still well below Caffeine at the original 4k cap (~994 000), confirming the cap is not the variable that matters here.

Optional follow-up: raising the cap

If a larger cap is judged worth the memory, throughput scales further. Filling each cache to its cap and measuring retained heap after GC:

Impl dos-miss throughput heap with cache full
Caffeine (4k) 994 000 ~15 MB
Caffeine (1M) 1 300 000 ~290 MB

The 290 MB ceiling is only approached under a sustained address-diverse workload (e.g. a full-sync through pre-EIP-150 DoS blocks) and is permanently bounded by the cap. This PR keeps the historical 4k so the memory profile stays identical to today; bumping the cap is a one-line follow-up.

End-to-end expectation

Measured directly with Caffeine at maximumSize(1_000_000) on a full-sync through pre-EIP-150 blocks: per-block import time went from ~25 s to ~1.8 s (~14×, parallel tx processing on). That measurement is from the larger-cap variant, not the 4k cap this PR ships, but the dominant cost being eliminated is the LockSupport.park() stall on the segment lock, which goes away purely from switching libraries, regardless of cap. The throughput delta between Caffeine 4k and 1M in the bench is much smaller (~30 %) than the delta between Caffeine and Guava (~2.5×), so the 4k variant should retain most of the end-to-end improvement; back-of-envelope extrapolation puts it at ~10× (~25 s to ~2-3 s per block). Not directly measured.

Bench source (click to expand)

Drop under datatypes/src/test/java/org/hyperledger/besu/datatypes/ and run with ./gradlew :datatypes:test --tests AddressHashCacheBench.

public class AddressHashCacheBench {

  static final LoadingCache<Address, Hash> GUAVA =
      CacheBuilder.newBuilder().maximumSize(4_000).build(
          new CacheLoader<>() { @Override public Hash load(Address k) { return Hash.hash(k.getBytes()); } });

  static final Cache<Address, Hash> CAFFEINE =
      Caffeine.newBuilder().maximumSize(4_000).build();

  @Test
  void bench() throws Exception {
    final int threads = 10, perThread = 50_000;
    Random rnd = new Random(42);

    Address[][] dos = new Address[threads][];
    for (int t = 0; t < threads; t++) {
      dos[t] = new Address[perThread];
      for (int i = 0; i < perThread; i++) {
        byte[] a = new byte[20]; rnd.nextBytes(a); dos[t][i] = Address.wrap(Bytes.wrap(a));
      }
    }
    Address[] hot = new Address[32];
    for (int i = 0; i < hot.length; i++) {
      byte[] a = new byte[20]; rnd.nextBytes(a); hot[i] = Address.wrap(Bytes.wrap(a));
    }

    for (int i = 0; i < 10_000; i++) { GUAVA.getUnchecked(hot[i & 31]); CAFFEINE.get(hot[i & 31], k -> Hash.hash(k.getBytes())); }

    for (String name : new String[] {"GUAVA", "CAFFEINE"}) {
      Function<Address, Hash> f = name.equals("GUAVA")
          ? (Function<Address, Hash>) GUAVA::getUnchecked
          : a -> CAFFEINE.get(a, k -> Hash.hash(k.getBytes()));
      run(name + "  hot-set ", threads, perThread, f, (tid, i) -> hot[(tid + i) & 31]);
      run(name + "  dos-miss", threads, perThread, f, (tid, i) -> dos[tid][i]);
      if (name.equals("GUAVA")) GUAVA.invalidateAll(); else CAFFEINE.invalidateAll();
    }
  }

  interface PickAddr { Address pick(int threadId, int i); }

  static void run(String label, int threads, int perThread, Function<Address, Hash> fn, PickAddr pick) throws Exception {
    ExecutorService es = Executors.newFixedThreadPool(threads);
    CountDownLatch start = new CountDownLatch(1), done = new CountDownLatch(threads);
    AtomicLong ops = new AtomicLong();
    for (int t = 0; t < threads; t++) {
      final int tid = t;
      es.submit(() -> {
        try { start.await(); } catch (InterruptedException ignored) {}
        long local = 0;
        for (int i = 0; i < perThread; i++) {
          if (fn.apply(pick.pick(tid, i)) == null) throw new AssertionError();
          local++;
        }
        ops.addAndGet(local);
        done.countDown();
      });
    }
    long t0 = System.nanoTime();
    start.countDown(); done.await();
    long ns = System.nanoTime() - t0;
    es.shutdown(); es.awaitTermination(1, TimeUnit.SECONDS);
    System.out.printf("%-22s  ops=%,d  elapsed=%6.2f s  throughput=%,d ops/s%n",
        label, ops.get(), ns / 1e9, (long)(ops.get() / (ns / 1e9)));
  }
}

Honest caveat

A bare ConcurrentHashMap was measurably faster than Caffeine in this specific bench (~12M / ~2M ops/s for hot / miss) because Keccak-256(20 bytes) is cheap enough (~500 ns native) that Caffeine's TinyLFU bookkeeping exceeds the value of its eviction policy. A CHM with a size-bounded clear() was the fastest bounded option tested. Caffeine was preferred here for consistency with the rest of the codebase and to sidestep ad-hoc eviction logic; happy to swap if reviewers would rather optimize for raw throughput.

Fixed Issue(s)

n/a

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

Under heavy miss rate (pre-EIP-150 DoS-era blocks spam BALANCE/EXTCODESIZE
against tens of thousands of pseudo-random addresses per tx) Guava's per-segment
ReentrantLock serialises parallel tx executors on every account-touching EVM
opcode. A thread dump of a stuck import thread on a Bonsai full-sync showed the
thread parked on LocalCache$Segment.storeLoadedValue.

Caffeine's load path is CAS-based (no segment write lock) and already the
in-house cache library used elsewhere in Besu.

Signed-off-by: Diego López León <dieguitoll@gmail.com>
@diega diega force-pushed the fix/address-hash-cache-contention branch from f7ce64f to 7645f0f Compare April 14, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant