diff --git a/docs/design/compilation-benchmarking.md b/docs/design/compilation-benchmarking.md new file mode 100644 index 00000000..6fe3ac2b --- /dev/null +++ b/docs/design/compilation-benchmarking.md @@ -0,0 +1,307 @@ +# Design: Benchmarking compilation with `@benchmark` + +## Motivation + +Today `@benchmark foo($x)` measures steady-state runtime. Compilation of `foo` +(and its call tree) happens at most once, either in the warm-up call or in the +first sample. This means: + +- Compile-time is not reported in a statistically meaningful way. +- Users wanting to characterize "time-to-first-execution" or track + regressions in inference/codegen cost have to roll their own harness + (typically using `@time` in a fresh process, or `SnoopCompile`). + +Goal: add a first-class mode + +```julia +@benchmark foo($x) compilation=true +``` + +that produces a `Trial` whose samples are compilation times (and optionally +inference / LLVM / allocations breakdown), with all the existing statistics +machinery (min/median/mean/std, tuning, comparison, regression detection). + +## Requirements + +1. **Repeatability**: each sample must actually recompile the code under + measurement. Running the same expression twice without intervention will + hit the cache on the second call. +2. **Scoped invalidation**: we must *only* invalidate methods reachable from + the expression under test. Blowing away all caches (`jl_drop_all_caches`) + would force `BenchmarkTools`, the REPL, and the test harness itself to + recompile on every sample, making the measurement meaningless and orders + of magnitude slower than the thing we want to measure. +3. **Low measurement overhead**: the recording/invalidation step is *not* + part of the sample; only the recompile + run is timed. +4. **Composable with existing `Parameters`**: `samples`, `evals`, + `seconds`, `gctrial`, etc. should continue to work. `evals` per sample + should probably be forced to 1 (each eval would otherwise share a cache). +5. **No dependency on external packages** (SnoopCompile, Cthulhu). The core + capability must live in `Base`/`Core` so that `BenchmarkTools` can depend + only on the standard library. + +## Proposed surface + +### In `Base` (new, internal-but-public) + +```julia +# Returns a collection of MethodInstance (or CodeInstance) objects that were +# actually executed while running `ex`. Equivalent in spirit to a +# `--trace-compile` trace, but captured in-process and returning live handles +# rather than strings. +methods = Base.@record_calls foo(x) + +# Drops native code + inferred IR for the given MethodInstances, such that +# the next dispatch to each one will re-infer and re-codegen. +# Does NOT bump the global world age and does NOT touch any MI not in the set. +Base.invalidate_calls(methods) +``` + +Optionally a convenience: + +```julia +Base.@with_recompilation foo(x) # record, invalidate, return @timed result +``` + +### In `BenchmarkTools` + +```julia +@benchmark foo($x) compilation=true +@benchmark foo($x) compilation=:full # infer + codegen (default) +@benchmark foo($x) compilation=:codegen_only # keep inferred IR, drop native +@benchmark foo($x) compilation=:inference_only +``` + +Trial samples would store (time_ns, compile_time_ns, recompile_time_ns, +inference_time_ns, gc_time_ns, bytes, allocs) — essentially the `NamedTuple` +already produced by `@timed`. + +## Design options + +The hard part is item 1+2: *scoped* invalidation. Four options, roughly in +increasing order of invasiveness in Base. + +### Option A: Re-eval in a fresh anonymous module per sample + +Sketch: wrap the expression in `@eval Module() begin ... end`. Each sample +defines a new closure in a throwaway module, which forces codegen for the +wrapper. The inner callee (`foo`) is still cached though — so this only +measures specialization of the wrapper, not of `foo` itself. Rejected as +insufficient. + +### Option B: Global cache drop per sample + +`Base.drop_all_caches()` already exists. Pros: trivial to implement, no new +API. Cons: + +- Recompiles everything the harness touches between samples (printing, + timing, `Statistics.quantile`, ...). Samples become dominated by + harness recompilation, not by `foo`. +- World-age bump changes semantics of captured closures. +- Samples are not independent: the Nth sample recompiles strictly less than + the 1st because some harness code stays hot. + +Useful as a `compilation=:nuclear` debugging escape hatch, but not the +default. + +### Option C: `--trace-compile` hook + per-MI invalidation (recommended) + +Two new pieces of machinery: + +**C.1 Recording.** Expose the existing trace-compile infrastructure as an +in-process callback rather than a text stream. The C runtime already +notifies on every `jl_generate_fptr` / inference entry (see +`jl_force_trace_compile_timing_enable` in `base/timing.jl` and the +`trace_compile` option). Add: + +```c +// src/gf.c / codegen.cpp +JL_DLLEXPORT void jl_set_trace_compile_callback( + void (*cb)(jl_method_instance_t*, int /*is_recompile*/, void*), + void *ctx); +``` + +and a Julia wrapper: + +```julia +# base/reflection.jl or base/compiler/... +function record_calls(f) + seen = IdSet{Core.MethodInstance}() + cb = mi -> push!(seen, mi) + prev = _set_trace_compile_callback(cb) + try + Base.invokelatest(f) + finally + _set_trace_compile_callback(prev) + end + return seen +end + +macro record_calls(ex) + :(record_calls(() -> $(esc(ex)))) +end +``` + +This piggybacks on infrastructure that already exists for +`--trace-compile` and `Base.@trace_compile`. No new instrumentation points +in the compiler. + +**Handling already-compiled code.** A critical subtlety: by the time the +user types `@btime foo($x) compilation=true`, `foo(x)` may already be fully +compiled (from an earlier REPL call, from precompilation, or from a +pkgimage). A naive `@record_calls foo(x)` would then observe *nothing*, +because the trace-compile callback only fires on actual codegen. The +recording pass must therefore force a compile of the target expression, not +just run it. Two strategies: + +1. **Record-by-invalidate-then-run (recommended).** Start with a sentinel + MI set (e.g. the entry-point `MethodInstance` of the call `foo(x)`, + obtainable via `Base.method_instance` / `Core.Compiler.specialize`). + Invalidate that single MI, then run `foo(x)` under the trace-compile + callback. Because the entry point is now uncompiled, dispatching to it + re-enters codegen, which in turn recursively forces codegen of any of + its callees whose native code has been dropped — and, crucially, also + reports any callees that were *already* compiled but had to be + re-specialized. Callees that stay cached will not appear, but that is + the correct answer: we don't want to recompile them on every sample + either. The captured `seen` is then the exact set we re-invalidate per + sample. +2. **Snapshot + diff.** Snapshot all `MethodInstance`s (or just those + reachable via `Base.specializations` from the target method) at entry, + run `f`, diff. This works even if nothing new compiles — the "diff" is + empty and we fall back to `{entry_point_mi}` alone. Simpler but misses + indirect callees that were already compiled. + +(1) is essentially what `SnoopCompile.@snoopi_deep` does, and it correctly +handles the already-compiled case because the forced invalidation of the +entry point guarantees at least one codegen event which then cascades. + +In both strategies the first call in the benchmark sequence does *double +duty*: it populates `seen` and produces the first sample. Subsequent +samples just `invalidate_calls(seen); @timed foo(x)` in a loop. + +**C.2 Invalidation.** Expose per-MI cache dropping: + +```c +JL_DLLEXPORT void jl_mi_clear_native_code(jl_method_instance_t *mi); +JL_DLLEXPORT void jl_mi_clear_inferred(jl_method_instance_t *mi); +``` + +and + +```julia +function invalidate_calls(mis; inferred::Bool=true, native::Bool=true) + for mi in mis + native && ccall(:jl_mi_clear_native_code, Cvoid, (Any,), mi) + inferred && ccall(:jl_mi_clear_inferred, Cvoid, (Any,), mi) + end +end +``` + +The implementation can lean on `invalidate_method_instance_caches` already +present in `src/gf.c`, but *without* the world-age bump that +`jl_method_table_disable` performs — we are not making a semantic change, +just dropping cached results. This is the key novelty: today's invalidation +APIs all assume the reason for invalidation is a method edit, so they bump +the world. For benchmarking we want a pure cache flush. + +Concerns and how to address them: + +- **`@generated` functions / cfunctions / `precompile`d code**: some MIs + are pinned. `invalidate_calls` should silently skip what it cannot drop + and optionally report it. BenchmarkTools would surface a warning like + `"17/342 methods could not be invalidated and will not be re-timed"`. +- **Backedges**: dropping native code for `mi` does not need to propagate + along backedges, because callers were compiled against `mi`'s signature + not its native address; the dispatch will re-enter `mi` and trigger + codegen on demand. +- **Concurrency / world age**: since we do not bump the world, other tasks + can keep running; they will just pay codegen cost if they happen to call + one of the invalidated MIs concurrently. BenchmarkTools already assumes + sole ownership of the machine during a sample, so this is acceptable. +- **Inlined callees**: if `bar` was inlined into `foo`, dropping `bar`'s + native code does nothing — the code is duplicated inside `foo`'s native + image. Dropping `foo` handles this correctly. This matches user + expectation: `@benchmark foo($x) compilation=true` should measure the + cost of compiling `foo` (with its current inlining decisions), not each + inlinee independently. + +### Option D: Process-level isolation + +Run each sample in a fresh `julia` subprocess with +`--compile=all --trace-compile=...`, parse timings out of stdout. This is +what benchmarking-for-TTFX tools (`PkgEval`, `SnoopCompile`'s +`@snoopi_deep` with `flamegraph`) effectively do. + +Pros: perfectly isolated, no API surface in Base. +Cons: + +- Sample time ≈ 1–5 s of Julia startup + sysimage load, dominating what + we want to measure for anything small. +- Cannot interpolate live Julia values (`$x`); would need serialization. +- Poor fit for `@benchmark`'s sampling loop. + +Reasonable as a future `@benchmark_compile_isolated` macro, not as the +primary mechanism. + +## Recommendation + +Implement **Option C** in two PRs: + +1. **julia PR**: add `Base.@record_calls` / `Base.record_calls` and + `Base.invalidate_calls` (names open for bikeshedding — `Base.Compiler` + may be a better home). Internally reuse the trace-compile callback + machinery and `invalidate_method_instance_caches`. Mark them as + experimental (`Base.Experimental`) for the first release. + +2. **BenchmarkTools PR**: add a `compilation::Union{Bool,Symbol}` + parameter to `Parameters`, wire it through `Benchmark.sample`, and + store per-sample compile/infer/codegen/gc times. `evals` is forced + to 1 when `compilation !== false`. Add a `ratio`/`judge` path for + comparing compile-time trials just like runtime trials. + +## Open questions + +1. Should `invalidate_calls` take `MethodInstance`s, `CodeInstance`s, or + both? `CodeInstance` is finer-grained (per-world, per-signature) and is + what the backedge graph actually uses now; `MethodInstance` is what + `--trace-compile` surfaces today. +2. Should `@record_calls` record transitively-inlined callees, or only + entry points the compiler was invoked on? For benchmarking we want the + latter; for introspection users may want the former. +3. `evals > 1`: could we support it by re-invalidating between evals + *within* a sample? That would charge invalidation cost into the sample, + so probably no — force `evals=1`. +4. Interaction with `--pkgimages=yes`: MIs loaded from a pkgimage are + memory-mapped read-only. `jl_mi_clear_native_code` must either copy + them out first or simply refuse; the former is preferable so TTFX-style + measurements work — and is required for the already-compiled case to + be useful, since most real-world code lives in pkgimages. +5. Interaction with `Revise`: Revise relies on the current invalidation + API bumping world age. Our new path must not be confused with a + user-visible method edit. Keeping it as a separate C entry point (and + not going through `jl_method_table_disable`) achieves this. +6. Entry-point resolution: to bootstrap the record pass when `foo(x)` is + already compiled, we need to turn the surface syntax `foo($x)` into the + `MethodInstance` that would be dispatched to. `Base.method_instance(f, + types)` (or equivalent via `which` + `Core.Compiler.specialize`) is the + right primitive; BenchmarkTools already has the arg tuple from its + quote/interpolation machinery. + +## Example (target UX) + +```julia +julia> using BenchmarkTools + +julia> f(x) = sum(abs2, x) + prod(x .+ 1) +f (generic function with 1 method) + +julia> @benchmark f($(rand(100))) compilation=true +BenchmarkTools.Trial: 48 samples with 1 evaluation per sample. + Range (min … max): 92.1 ms … 138.4 ms ┊ GC (min … max): 0.0% … 3.1% + Time (median): 101.7 ms ┊ GC (median): 0.8% + Time (mean ± σ): 104.3 ms ± 8.9 ms ┊ GC (mean ± σ): 1.1% ± 1.3% + Compile: 98.2 ms (94.1%) Infer: 41.7 ms (40.0%) + Recompile: 0 ns Codegen: 56.5 ms (54.1%) + Methods recompiled per sample: 14 (± 0) +``` diff --git a/src/BenchmarkTools.jl b/src/BenchmarkTools.jl index 95e68a72..18126316 100644 --- a/src/BenchmarkTools.jl +++ b/src/BenchmarkTools.jl @@ -27,6 +27,7 @@ export loadparams! include("trials.jl") export gctime, + compiletime, memory, allocs, params, diff --git a/src/execution.jl b/src/execution.jl index 4458e2fa..cc62f19b 100644 --- a/src/execution.jl +++ b/src/execution.jl @@ -20,7 +20,7 @@ mutable struct Benchmark params::Parameters end -const SampleResult = Tuple{Float64,Float64,Int,Int} +const SampleResult = Tuple{Float64,Float64,Int,Int,Float64} params(b::Benchmark) = b.params @@ -119,7 +119,13 @@ function _run( ) params = Parameters(p; kwargs...) @assert params.seconds > 0.0 "time limit must be greater than 0.0" - sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0)) + if params.compilation + # Each sample recompiles the target; running it multiple times per sample + # would hit the cache for all but the first eval, so the cost would be + # amortized away. Force one eval per sample. + params.evals = 1 + end + sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0, 0.0)) if warmup saved_evals = params.evals params.evals = 1 @@ -132,7 +138,7 @@ function _run( start_time = Base.time() b.samplefunc(b.quote_vals, params, sample_ref, nothing) s = sample_ref[] - push!(trial, s[1], s[2], s[3], s[4]) + push!(trial, s[1], s[2], s[3], s[4], s[5]) sample_time_s = s[1] * params.evals / 1e9 estimated_remaining = if sample_time_s > 0 min( @@ -144,12 +150,13 @@ function _run( end sizehint!(trial.times, 1 + estimated_remaining) sizehint!(trial.gctimes, 1 + estimated_remaining) + sizehint!(trial.compiletimes, 1 + estimated_remaining) iters = 2 while (Base.time() - start_time) < params.seconds && iters ≤ params.samples params.gcsample && s[4] > 0 && gcscrub() b.samplefunc(b.quote_vals, params, sample_ref, nothing) s = sample_ref[] - push!(trial, s[1], s[2], s[3], s[4]) + push!(trial, s[1], s[2], s[3], s[4], s[5]) iters += 1 end return_val = if capture_result @@ -217,7 +224,7 @@ function _lineartrial(b::Benchmark, p::Parameters=b.params; maxevals=RESOLUTION, params = Parameters(p; kwargs...) estimates = zeros(maxevals) completed = 0 - sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0)) + sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0, 0.0)) params.evals = 1 b.samplefunc(b.quote_vals, params, sample_ref, nothing) warmup_allocs = sample_ref[][4] @@ -654,6 +661,41 @@ function generate_benchmark_definition( ) $(setup) __evals = __params.evals + local __compiletime = 0.0 + if __params.compilation + # Scoped invalidation of the call's entry-point MethodInstance, + # then a single timed eval that captures wall + compile time. + # We call via `invokelatest` so the direct call site baked + # into this samplefunc's own machine code is bypassed — + # invalidation caps the CodeInstance but does not + # propagate to backedges (that would defeat the purpose), + # so a devirtualized call would keep using the stale fptr. + __arg_types = Tuple{$([:(Core.Typeof($v)) for v in [quote_vars; setup_vars]]...)} + __mi = Base.method_instance($(corefunc), __arg_types) + if __mi !== nothing + Base.invalidate_calls(Core.MethodInstance[__mi]) + end + __stats = Base.@timed Base.invokelatest( + $(corefunc), $(quote_vars...), $(setup_vars...) + ) + $(teardown) + if __result_ref !== nothing + __result_ref[] = __stats.value + end + __gcdiff = __stats.gcstats + __time = max(__stats.time * 1e9 - __params.overhead, 0.001) + __gctime = max(__stats.gctime * 1e9 - __params.overhead, 0.0) + __memory = Int(__gcdiff.allocd) + __allocs = Int( + __gcdiff.malloc + + __gcdiff.realloc + + __gcdiff.poolalloc + + __gcdiff.bigalloc, + ) + __compiletime = __stats.compile_time * 1e9 + __sample_ref[] = (__time, __gctime, __memory, __allocs, __compiletime) + return nothing + end __gc_start = Base.gc_num() __start_time = time_ns() __return_val = $(invocation) @@ -678,7 +720,7 @@ function generate_benchmark_definition( __evals, ), ) - __sample_ref[] = (__time, __gctime, __memory, __allocs) + __sample_ref[] = (__time, __gctime, __memory, __allocs, __compiletime) return nothing end end, diff --git a/src/parameters.jl b/src/parameters.jl index ff1bc615..16ffd458 100644 --- a/src/parameters.jl +++ b/src/parameters.jl @@ -15,9 +15,10 @@ mutable struct Parameters gcsample::Bool time_tolerance::Float64 memory_tolerance::Float64 + compilation::Bool end -const DEFAULT_PARAMETERS = Parameters(5.0, 10000, 1, false, 0, true, false, 0.05, 0.01) +const DEFAULT_PARAMETERS = Parameters(5.0, 10000, 1, false, 0, true, false, 0.05, 0.01, false) function Parameters(; seconds=DEFAULT_PARAMETERS.seconds, @@ -29,6 +30,7 @@ function Parameters(; gcsample=DEFAULT_PARAMETERS.gcsample, time_tolerance=DEFAULT_PARAMETERS.time_tolerance, memory_tolerance=DEFAULT_PARAMETERS.memory_tolerance, + compilation=DEFAULT_PARAMETERS.compilation, ) return Parameters( seconds, @@ -40,6 +42,7 @@ function Parameters(; gcsample, time_tolerance, memory_tolerance, + compilation, ) end @@ -53,6 +56,7 @@ function Parameters( gcsample=nothing, time_tolerance=nothing, memory_tolerance=nothing, + compilation=nothing, ) params = Parameters() params.seconds = seconds != nothing ? seconds : default.seconds @@ -65,6 +69,7 @@ function Parameters( time_tolerance != nothing ? time_tolerance : default.time_tolerance params.memory_tolerance = memory_tolerance != nothing ? memory_tolerance : default.memory_tolerance + params.compilation = compilation != nothing ? compilation : default.compilation return params::BenchmarkTools.Parameters end @@ -76,7 +81,8 @@ function Base.:(==)(a::Parameters, b::Parameters) a.gctrial == b.gctrial && a.gcsample == b.gcsample && a.time_tolerance == b.time_tolerance && - a.memory_tolerance == b.memory_tolerance + a.memory_tolerance == b.memory_tolerance && + a.compilation == b.compilation end function Base.copy(p::Parameters) @@ -90,6 +96,7 @@ function Base.copy(p::Parameters) p.gcsample, p.time_tolerance, p.memory_tolerance, + p.compilation, ) end diff --git a/src/serialization.jl b/src/serialization.jl index 5aa7cced..67d7ee32 100644 --- a/src/serialization.jl +++ b/src/serialization.jl @@ -55,6 +55,11 @@ function recover(x::Vector) else xsi = if fn == "evals_set" && !haskey(fields, fn) false + elseif fn == "compilation" && !haskey(fields, fn) + false + elseif fn == "compiletimes" && !haskey(fields, fn) + # Old serialized Trials predate compile-time tracking; fill with zeros. + zeros(Float64, length(get(fields, "times", Float64[]))) elseif fn in ("seconds", "overhead", "time_tolerance", "memory_tolerance") && fields[fn] === nothing # JSON spec doesn't support Inf diff --git a/src/trials.jl b/src/trials.jl index ec5627a8..a3cdd176 100644 --- a/src/trials.jl +++ b/src/trials.jl @@ -6,27 +6,34 @@ mutable struct Trial params::Parameters times::Vector{Float64} gctimes::Vector{Float64} + compiletimes::Vector{Float64} memory::Int allocs::Int end -Trial(params::Parameters) = Trial(params, Float64[], Float64[], typemax(Int), typemax(Int)) +Trial(params::Parameters) = Trial(params, Float64[], Float64[], Float64[], typemax(Int), typemax(Int)) + +# Backward-compatible 5-arg constructor (pre-compile-time-tracking). +Trial(params::Parameters, times::AbstractVector, gctimes::AbstractVector, memory::Integer, allocs::Integer) = + Trial(params, times, gctimes, zeros(Float64, length(times)), memory, allocs) function Base.:(==)(a::Trial, b::Trial) return a.params == b.params && a.times == b.times && a.gctimes == b.gctimes && + a.compiletimes == b.compiletimes && a.memory == b.memory && a.allocs == b.allocs end function Base.copy(t::Trial) - return Trial(copy(t.params), copy(t.times), copy(t.gctimes), t.memory, t.allocs) + return Trial(copy(t.params), copy(t.times), copy(t.gctimes), copy(t.compiletimes), t.memory, t.allocs) end -function Base.push!(t::Trial, time, gctime, memory, allocs) +function Base.push!(t::Trial, time, gctime, memory, allocs, compiletime=0.0) push!(t.times, time) push!(t.gctimes, gctime) + push!(t.compiletimes, compiletime) memory < t.memory && (t.memory = memory) allocs < t.allocs && (t.allocs = allocs) return t @@ -35,20 +42,22 @@ end function Base.deleteat!(t::Trial, i) deleteat!(t.times, i) deleteat!(t.gctimes, i) + deleteat!(t.compiletimes, i) return t end Base.length(t::Trial) = length(t.times) function Base.getindex(t::Trial, i::Number) - return push!(Trial(t.params), t.times[i], t.gctimes[i], t.memory, t.allocs) + return push!(Trial(t.params), t.times[i], t.gctimes[i], t.memory, t.allocs, t.compiletimes[i]) end -Base.getindex(t::Trial, i) = Trial(t.params, t.times[i], t.gctimes[i], t.memory, t.allocs) +Base.getindex(t::Trial, i) = Trial(t.params, t.times[i], t.gctimes[i], t.compiletimes[i], t.memory, t.allocs) Base.lastindex(t::Trial) = length(t) function Base.sort!(t::Trial) inds = sortperm(t.times) t.times = t.times[inds] t.gctimes = t.gctimes[inds] + t.compiletimes = t.compiletimes[inds] return t end @@ -56,6 +65,7 @@ Base.sort(t::Trial) = sort!(copy(t)) Base.time(t::Trial) = time(minimum(t)) gctime(t::Trial) = gctime(minimum(t)) +compiletime(t::Trial) = length(t.compiletimes) == 0 ? 0.0 : minimum(t.compiletimes) memory(t::Trial) = t.memory allocs(t::Trial) = t.allocs params(t::Trial) = t.params @@ -575,7 +585,23 @@ function Base.show(@nospecialize(io::IO), ::MIME"text/plain", t::Trial) print(io, ", allocs estimate") printstyled(io, ": "; color=:light_black) printstyled(io, allocsstr; color=:yellow) - return print(io, ".") + print(io, ".") + + # Compile-time info (only shown when compilation benchmarking was enabled) + if t.params.compilation && !isempty(t.compiletimes) + minct = minimum(t.compiletimes) + medct = median(t.compiletimes) + maxct = maximum(t.compiletimes) + print(io, "\n ") + printstyled(io, "Compile"; color=:yellow, bold=true) + printstyled(io, " (min … median … max): "; color=:light_black) + printstyled(io, prettytime(minct); color=:yellow) + print(io, " … ") + printstyled(io, prettytime(medct); color=:yellow, bold=true) + print(io, " … ") + printstyled(io, prettytime(maxct); color=:yellow) + end + return nothing end function Base.show(@nospecialize(io::IO), ::MIME"text/plain", t::TrialEstimate) diff --git a/test/ExecutionTests.jl b/test/ExecutionTests.jl index e78f0791..f5f0082c 100644 --- a/test/ExecutionTests.jl +++ b/test/ExecutionTests.jl @@ -405,7 +405,7 @@ GC.gc() # Ensure the harness itself doesn't allocate for a zero-allocation benchmark let b = @benchmarkable sin($(1)) tune!(b) - sample_ref = Ref{BenchmarkTools.SampleResult}((0.0, 0.0, 0, 0)) + sample_ref = Ref{BenchmarkTools.SampleResult}((0.0, 0.0, 0, 0, 0.0)) b.samplefunc(b.quote_vals, b.params, sample_ref, nothing) s = sample_ref[] @test s[4] == 0 # allocs @@ -420,4 +420,26 @@ end @test_throws MethodError ratio() @test_throws MethodError judge() +######################### +# compilation=true mode # +######################### +# Benchmark compilation time by invalidating the target call's entry-point +# MethodInstance between samples so each sample forces a fresh codegen. +if isdefined(Base, :invalidate_calls) + compile_bench_target(x) = sum(abs2, x) + prod(x .+ one(eltype(x))) + # Warm up to ensure we exercise the already-compiled path. + compile_bench_target(rand(8)) + let t = @benchmark compile_bench_target($(rand(8))) samples=5 seconds=30 compilation=true evals=1 + @test length(t) >= 2 + @test t.params.compilation + @test t.params.evals == 1 + @test length(t.compiletimes) == length(t.times) + # At least one sample should observe non-zero compile time. Allow a + # generous tolerance since the underlying invalidation + codegen is + # stochastic across the harness. + @test any(>(0), t.compiletimes) + @test compiletime(t) >= 0.0 + end +end + end # module