diff --git a/docs/design/compilation-benchmarking.md b/docs/design/compilation-benchmarking.md
new file mode 100644
index 00000000..6fe3ac2b
--- /dev/null
+++ b/docs/design/compilation-benchmarking.md
@@ -0,0 +1,307 @@
+# Design: Benchmarking compilation with `@benchmark`
+
+## Motivation
+
+Today `@benchmark foo($x)` measures steady-state runtime. Compilation of `foo`
+(and its call tree) happens at most once, either in the warm-up call or in the
+first sample. This means:
+
+- Compile-time is not reported in a statistically meaningful way.
+- Users wanting to characterize "time-to-first-execution" or track
+  regressions in inference/codegen cost have to roll their own harness
+  (typically using `@time` in a fresh process, or `SnoopCompile`).
+
+Goal: add a first-class mode
+
+```julia
+@benchmark foo($x) compilation=true
+```
+
+that produces a `Trial` whose samples are compilation times (and optionally
+inference / LLVM / allocations breakdown), with all the existing statistics
+machinery (min/median/mean/std, tuning, comparison, regression detection).
+
+## Requirements
+
+1. **Repeatability**: each sample must actually recompile the code under
+   measurement. Running the same expression twice without intervention will
+   hit the cache on the second call.
+2. **Scoped invalidation**: we must *only* invalidate methods reachable from
+   the expression under test. Blowing away all caches (`jl_drop_all_caches`)
+   would force `BenchmarkTools`, the REPL, and the test harness itself to
+   recompile on every sample, making the measurement meaningless and orders
+   of magnitude slower than the thing we want to measure.
+3. **Low measurement overhead**: the recording/invalidation step is *not*
+   part of the sample; only the recompile + run is timed.
+4. **Composable with existing `Parameters`**: `samples`, `evals`,
+   `seconds`, `gctrial`, etc. should continue to work. `evals` per sample
+   should probably be forced to 1 (each eval would otherwise share a cache).
+5. **No dependency on external packages** (SnoopCompile, Cthulhu). The core
+   capability must live in `Base`/`Core` so that `BenchmarkTools` can depend
+   only on the standard library.
+
+## Proposed surface
+
+### In `Base` (new, internal-but-public)
+
+```julia
+# Returns a collection of MethodInstance (or CodeInstance) objects that were
+# actually executed while running `ex`. Equivalent in spirit to a
+# `--trace-compile` trace, but captured in-process and returning live handles
+# rather than strings.
+methods = Base.@record_calls foo(x)
+
+# Drops native code + inferred IR for the given MethodInstances, such that
+# the next dispatch to each one will re-infer and re-codegen.
+# Does NOT bump the global world age and does NOT touch any MI not in the set.
+Base.invalidate_calls(methods)
+```
+
+Optionally a convenience:
+
+```julia
+Base.@with_recompilation foo(x)  # record, invalidate, return @timed result
+```
+
+### In `BenchmarkTools`
+
+```julia
+@benchmark foo($x) compilation=true
+@benchmark foo($x) compilation=:full          # infer + codegen (default)
+@benchmark foo($x) compilation=:codegen_only  # keep inferred IR, drop native
+@benchmark foo($x) compilation=:inference_only
+```
+
+Trial samples would store (time_ns, compile_time_ns, recompile_time_ns,
+inference_time_ns, gc_time_ns, bytes, allocs) — essentially the `NamedTuple`
+already produced by `@timed`.
+
+## Design options
+
+The hard part is item 1+2: *scoped* invalidation. Four options, roughly in
+increasing order of invasiveness in Base.
+
+### Option A: Re-eval in a fresh anonymous module per sample
+
+Sketch: wrap the expression in `@eval Module() begin ... end`. Each sample
+defines a new closure in a throwaway module, which forces codegen for the
+wrapper. The inner callee (`foo`) is still cached though — so this only
+measures specialization of the wrapper, not of `foo` itself. Rejected as
+insufficient.
+
+### Option B: Global cache drop per sample
+
+`Base.drop_all_caches()` already exists. Pros: trivial to implement, no new
+API. Cons:
+
+- Recompiles everything the harness touches between samples (printing,
+  timing, `Statistics.quantile`, ...). Samples become dominated by
+  harness recompilation, not by `foo`.
+- World-age bump changes semantics of captured closures.
+- Samples are not independent: the Nth sample recompiles strictly less than
+  the 1st because some harness code stays hot.
+
+Useful as a `compilation=:nuclear` debugging escape hatch, but not the
+default.
+
+### Option C: `--trace-compile` hook + per-MI invalidation (recommended)
+
+Two new pieces of machinery:
+
+**C.1 Recording.** Expose the existing trace-compile infrastructure as an
+in-process callback rather than a text stream. The C runtime already
+notifies on every `jl_generate_fptr` / inference entry (see
+`jl_force_trace_compile_timing_enable` in `base/timing.jl` and the
+`trace_compile` option). Add:
+
+```c
+// src/gf.c / codegen.cpp
+JL_DLLEXPORT void jl_set_trace_compile_callback(
+    void (*cb)(jl_method_instance_t*, int /*is_recompile*/, void*),
+    void *ctx);
+```
+
+and a Julia wrapper:
+
+```julia
+# base/reflection.jl or base/compiler/...
+function record_calls(f)
+    seen = IdSet{Core.MethodInstance}()
+    cb = mi -> push!(seen, mi)
+    prev = _set_trace_compile_callback(cb)
+    try
+        Base.invokelatest(f)
+    finally
+        _set_trace_compile_callback(prev)
+    end
+    return seen
+end
+
+macro record_calls(ex)
+    :(record_calls(() -> $(esc(ex))))
+end
+```
+
+This piggybacks on infrastructure that already exists for
+`--trace-compile` and `Base.@trace_compile`. No new instrumentation points
+in the compiler.
+
+**Handling already-compiled code.** A critical subtlety: by the time the
+user types `@btime foo($x) compilation=true`, `foo(x)` may already be fully
+compiled (from an earlier REPL call, from precompilation, or from a
+pkgimage). A naive `@record_calls foo(x)` would then observe *nothing*,
+because the trace-compile callback only fires on actual codegen. The
+recording pass must therefore force a compile of the target expression, not
+just run it. Two strategies:
+
+1. **Record-by-invalidate-then-run (recommended).** Start with a sentinel
+   MI set (e.g. the entry-point `MethodInstance` of the call `foo(x)`,
+   obtainable via `Base.method_instance` / `Core.Compiler.specialize`).
+   Invalidate that single MI, then run `foo(x)` under the trace-compile
+   callback. Because the entry point is now uncompiled, dispatching to it
+   re-enters codegen, which in turn recursively forces codegen of any of
+   its callees whose native code has been dropped — and, crucially, also
+   reports any callees that were *already* compiled but had to be
+   re-specialized. Callees that stay cached will not appear, but that is
+   the correct answer: we don't want to recompile them on every sample
+   either. The captured `seen` is then the exact set we re-invalidate per
+   sample.
+2. **Snapshot + diff.** Snapshot all `MethodInstance`s (or just those
+   reachable via `Base.specializations` from the target method) at entry,
+   run `f`, diff. This works even if nothing new compiles — the "diff" is
+   empty and we fall back to `{entry_point_mi}` alone. Simpler but misses
+   indirect callees that were already compiled.
+
+(1) is essentially what `SnoopCompile.@snoopi_deep` does, and it correctly
+handles the already-compiled case because the forced invalidation of the
+entry point guarantees at least one codegen event which then cascades.
+
+In both strategies the first call in the benchmark sequence does *double
+duty*: it populates `seen` and produces the first sample. Subsequent
+samples just `invalidate_calls(seen); @timed foo(x)` in a loop.
+
+**C.2 Invalidation.** Expose per-MI cache dropping:
+
+```c
+JL_DLLEXPORT void jl_mi_clear_native_code(jl_method_instance_t *mi);
+JL_DLLEXPORT void jl_mi_clear_inferred(jl_method_instance_t *mi);
+```
+
+and
+
+```julia
+function invalidate_calls(mis; inferred::Bool=true, native::Bool=true)
+    for mi in mis
+        native   && ccall(:jl_mi_clear_native_code, Cvoid, (Any,), mi)
+        inferred && ccall(:jl_mi_clear_inferred,    Cvoid, (Any,), mi)
+    end
+end
+```
+
+The implementation can lean on `invalidate_method_instance_caches` already
+present in `src/gf.c`, but *without* the world-age bump that
+`jl_method_table_disable` performs — we are not making a semantic change,
+just dropping cached results. This is the key novelty: today's invalidation
+APIs all assume the reason for invalidation is a method edit, so they bump
+the world. For benchmarking we want a pure cache flush.
+
+Concerns and how to address them:
+
+- **`@generated` functions / cfunctions / `precompile`d code**: some MIs
+  are pinned. `invalidate_calls` should silently skip what it cannot drop
+  and optionally report it. BenchmarkTools would surface a warning like
+  `"17/342 methods could not be invalidated and will not be re-timed"`.
+- **Backedges**: dropping native code for `mi` does not need to propagate
+  along backedges, because callers were compiled against `mi`'s signature
+  not its native address; the dispatch will re-enter `mi` and trigger
+  codegen on demand.
+- **Concurrency / world age**: since we do not bump the world, other tasks
+  can keep running; they will just pay codegen cost if they happen to call
+  one of the invalidated MIs concurrently. BenchmarkTools already assumes
+  sole ownership of the machine during a sample, so this is acceptable.
+- **Inlined callees**: if `bar` was inlined into `foo`, dropping `bar`'s
+  native code does nothing — the code is duplicated inside `foo`'s native
+  image. Dropping `foo` handles this correctly. This matches user
+  expectation: `@benchmark foo($x) compilation=true` should measure the
+  cost of compiling `foo` (with its current inlining decisions), not each
+  inlinee independently.
+
+### Option D: Process-level isolation
+
+Run each sample in a fresh `julia` subprocess with
+`--compile=all --trace-compile=...`, parse timings out of stdout. This is
+what benchmarking-for-TTFX tools (`PkgEval`, `SnoopCompile`'s
+`@snoopi_deep` with `flamegraph`) effectively do.
+
+Pros: perfectly isolated, no API surface in Base.
+Cons:
+
+- Sample time ≈ 1–5 s of Julia startup + sysimage load, dominating what
+  we want to measure for anything small.
+- Cannot interpolate live Julia values (`$x`); would need serialization.
+- Poor fit for `@benchmark`'s sampling loop.
+
+Reasonable as a future `@benchmark_compile_isolated` macro, not as the
+primary mechanism.
+
+## Recommendation
+
+Implement **Option C** in two PRs:
+
+1. **julia PR**: add `Base.@record_calls` / `Base.record_calls` and
+   `Base.invalidate_calls` (names open for bikeshedding — `Base.Compiler`
+   may be a better home). Internally reuse the trace-compile callback
+   machinery and `invalidate_method_instance_caches`. Mark them as
+   experimental (`Base.Experimental`) for the first release.
+
+2. **BenchmarkTools PR**: add a `compilation::Union{Bool,Symbol}`
+   parameter to `Parameters`, wire it through `Benchmark.sample`, and
+   store per-sample compile/infer/codegen/gc times. `evals` is forced
+   to 1 when `compilation !== false`. Add a `ratio`/`judge` path for
+   comparing compile-time trials just like runtime trials.
+
+## Open questions
+
+1. Should `invalidate_calls` take `MethodInstance`s, `CodeInstance`s, or
+   both? `CodeInstance` is finer-grained (per-world, per-signature) and is
+   what the backedge graph actually uses now; `MethodInstance` is what
+   `--trace-compile` surfaces today.
+2. Should `@record_calls` record transitively-inlined callees, or only
+   entry points the compiler was invoked on? For benchmarking we want the
+   latter; for introspection users may want the former.
+3. `evals > 1`: could we support it by re-invalidating between evals
+   *within* a sample? That would charge invalidation cost into the sample,
+   so probably no — force `evals=1`.
+4. Interaction with `--pkgimages=yes`: MIs loaded from a pkgimage are
+   memory-mapped read-only. `jl_mi_clear_native_code` must either copy
+   them out first or simply refuse; the former is preferable so TTFX-style
+   measurements work — and is required for the already-compiled case to
+   be useful, since most real-world code lives in pkgimages.
+5. Interaction with `Revise`: Revise relies on the current invalidation
+   API bumping world age. Our new path must not be confused with a
+   user-visible method edit. Keeping it as a separate C entry point (and
+   not going through `jl_method_table_disable`) achieves this.
+6. Entry-point resolution: to bootstrap the record pass when `foo(x)` is
+   already compiled, we need to turn the surface syntax `foo($x)` into the
+   `MethodInstance` that would be dispatched to. `Base.method_instance(f,
+   types)` (or equivalent via `which` + `Core.Compiler.specialize`) is the
+   right primitive; BenchmarkTools already has the arg tuple from its
+   quote/interpolation machinery.
+
+## Example (target UX)
+
+```julia
+julia> using BenchmarkTools
+
+julia> f(x) = sum(abs2, x) + prod(x .+ 1)
+f (generic function with 1 method)
+
+julia> @benchmark f($(rand(100))) compilation=true
+BenchmarkTools.Trial: 48 samples with 1 evaluation per sample.
+ Range (min … max):  92.1 ms … 138.4 ms  ┊ GC (min … max): 0.0% … 3.1%
+ Time  (median):    101.7 ms             ┊ GC (median):    0.8%
+ Time  (mean ± σ):  104.3 ms ±   8.9 ms  ┊ GC (mean ± σ):  1.1% ± 1.3%
+ Compile:  98.2 ms (94.1%)   Infer: 41.7 ms (40.0%)
+ Recompile: 0 ns             Codegen: 56.5 ms (54.1%)
+ Methods recompiled per sample: 14 (± 0)
+```
diff --git a/src/BenchmarkTools.jl b/src/BenchmarkTools.jl
index 95e68a72..18126316 100644
--- a/src/BenchmarkTools.jl
+++ b/src/BenchmarkTools.jl
@@ -27,6 +27,7 @@ export loadparams!
 include("trials.jl")
 
 export gctime,
+    compiletime,
     memory,
     allocs,
     params,
diff --git a/src/execution.jl b/src/execution.jl
index 4458e2fa..cc62f19b 100644
--- a/src/execution.jl
+++ b/src/execution.jl
@@ -20,7 +20,7 @@ mutable struct Benchmark
     params::Parameters
 end
 
-const SampleResult = Tuple{Float64,Float64,Int,Int}
+const SampleResult = Tuple{Float64,Float64,Int,Int,Float64}
 
 params(b::Benchmark) = b.params
 
@@ -119,7 +119,13 @@ function _run(
 )
     params = Parameters(p; kwargs...)
     @assert params.seconds > 0.0 "time limit must be greater than 0.0"
-    sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0))
+    if params.compilation
+        # Each sample recompiles the target; running it multiple times per sample
+        # would hit the cache for all but the first eval, so the cost would be
+        # amortized away. Force one eval per sample.
+        params.evals = 1
+    end
+    sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0, 0.0))
     if warmup
         saved_evals = params.evals
         params.evals = 1
@@ -132,7 +138,7 @@ function _run(
     start_time = Base.time()
     b.samplefunc(b.quote_vals, params, sample_ref, nothing)
     s = sample_ref[]
-    push!(trial, s[1], s[2], s[3], s[4])
+    push!(trial, s[1], s[2], s[3], s[4], s[5])
     sample_time_s = s[1] * params.evals / 1e9
     estimated_remaining = if sample_time_s > 0
         min(
@@ -144,12 +150,13 @@ function _run(
     end
     sizehint!(trial.times, 1 + estimated_remaining)
     sizehint!(trial.gctimes, 1 + estimated_remaining)
+    sizehint!(trial.compiletimes, 1 + estimated_remaining)
     iters = 2
     while (Base.time() - start_time) < params.seconds && iters ≤ params.samples
         params.gcsample && s[4] > 0 && gcscrub()
         b.samplefunc(b.quote_vals, params, sample_ref, nothing)
         s = sample_ref[]
-        push!(trial, s[1], s[2], s[3], s[4])
+        push!(trial, s[1], s[2], s[3], s[4], s[5])
         iters += 1
     end
     return_val = if capture_result
@@ -217,7 +224,7 @@ function _lineartrial(b::Benchmark, p::Parameters=b.params; maxevals=RESOLUTION,
     params = Parameters(p; kwargs...)
     estimates = zeros(maxevals)
     completed = 0
-    sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0))
+    sample_ref = Ref{SampleResult}((0.0, 0.0, 0, 0, 0.0))
     params.evals = 1
     b.samplefunc(b.quote_vals, params, sample_ref, nothing)
     warmup_allocs = sample_ref[][4]
@@ -654,6 +661,41 @@ function generate_benchmark_definition(
                 )
                     $(setup)
                     __evals = __params.evals
+                    local __compiletime = 0.0
+                    if __params.compilation
+                        # Scoped invalidation of the call's entry-point MethodInstance,
+                        # then a single timed eval that captures wall + compile time.
+                        # We call via `invokelatest` so the direct call site baked
+                        # into this samplefunc's own machine code is bypassed —
+                        # invalidation caps the CodeInstance but does not
+                        # propagate to backedges (that would defeat the purpose),
+                        # so a devirtualized call would keep using the stale fptr.
+                        __arg_types = Tuple{$([:(Core.Typeof($v)) for v in [quote_vars; setup_vars]]...)}
+                        __mi = Base.method_instance($(corefunc), __arg_types)
+                        if __mi !== nothing
+                            Base.invalidate_calls(Core.MethodInstance[__mi])
+                        end
+                        __stats = Base.@timed Base.invokelatest(
+                            $(corefunc), $(quote_vars...), $(setup_vars...)
+                        )
+                        $(teardown)
+                        if __result_ref !== nothing
+                            __result_ref[] = __stats.value
+                        end
+                        __gcdiff = __stats.gcstats
+                        __time = max(__stats.time * 1e9 - __params.overhead, 0.001)
+                        __gctime = max(__stats.gctime * 1e9 - __params.overhead, 0.0)
+                        __memory = Int(__gcdiff.allocd)
+                        __allocs = Int(
+                            __gcdiff.malloc +
+                            __gcdiff.realloc +
+                            __gcdiff.poolalloc +
+                            __gcdiff.bigalloc,
+                        )
+                        __compiletime = __stats.compile_time * 1e9
+                        __sample_ref[] = (__time, __gctime, __memory, __allocs, __compiletime)
+                        return nothing
+                    end
                     __gc_start = Base.gc_num()
                     __start_time = time_ns()
                     __return_val = $(invocation)
@@ -678,7 +720,7 @@ function generate_benchmark_definition(
                             __evals,
                         ),
                     )
-                    __sample_ref[] = (__time, __gctime, __memory, __allocs)
+                    __sample_ref[] = (__time, __gctime, __memory, __allocs, __compiletime)
                     return nothing
                 end
             end,
diff --git a/src/parameters.jl b/src/parameters.jl
index ff1bc615..16ffd458 100644
--- a/src/parameters.jl
+++ b/src/parameters.jl
@@ -15,9 +15,10 @@ mutable struct Parameters
     gcsample::Bool
     time_tolerance::Float64
     memory_tolerance::Float64
+    compilation::Bool
 end
 
-const DEFAULT_PARAMETERS = Parameters(5.0, 10000, 1, false, 0, true, false, 0.05, 0.01)
+const DEFAULT_PARAMETERS = Parameters(5.0, 10000, 1, false, 0, true, false, 0.05, 0.01, false)
 
 function Parameters(;
     seconds=DEFAULT_PARAMETERS.seconds,
@@ -29,6 +30,7 @@ function Parameters(;
     gcsample=DEFAULT_PARAMETERS.gcsample,
     time_tolerance=DEFAULT_PARAMETERS.time_tolerance,
     memory_tolerance=DEFAULT_PARAMETERS.memory_tolerance,
+    compilation=DEFAULT_PARAMETERS.compilation,
 )
     return Parameters(
         seconds,
@@ -40,6 +42,7 @@ function Parameters(;
         gcsample,
         time_tolerance,
         memory_tolerance,
+        compilation,
     )
 end
 
@@ -53,6 +56,7 @@ function Parameters(
     gcsample=nothing,
     time_tolerance=nothing,
     memory_tolerance=nothing,
+    compilation=nothing,
 )
     params = Parameters()
     params.seconds = seconds != nothing ? seconds : default.seconds
@@ -65,6 +69,7 @@ function Parameters(
         time_tolerance != nothing ? time_tolerance : default.time_tolerance
     params.memory_tolerance =
         memory_tolerance != nothing ? memory_tolerance : default.memory_tolerance
+    params.compilation = compilation != nothing ? compilation : default.compilation
     return params::BenchmarkTools.Parameters
 end
 
@@ -76,7 +81,8 @@ function Base.:(==)(a::Parameters, b::Parameters)
            a.gctrial == b.gctrial &&
            a.gcsample == b.gcsample &&
            a.time_tolerance == b.time_tolerance &&
-           a.memory_tolerance == b.memory_tolerance
+           a.memory_tolerance == b.memory_tolerance &&
+           a.compilation == b.compilation
 end
 
 function Base.copy(p::Parameters)
@@ -90,6 +96,7 @@ function Base.copy(p::Parameters)
         p.gcsample,
         p.time_tolerance,
         p.memory_tolerance,
+        p.compilation,
     )
 end
 
diff --git a/src/serialization.jl b/src/serialization.jl
index 5aa7cced..67d7ee32 100644
--- a/src/serialization.jl
+++ b/src/serialization.jl
@@ -55,6 +55,11 @@ function recover(x::Vector)
         else
             xsi = if fn == "evals_set" && !haskey(fields, fn)
                 false
+            elseif fn == "compilation" && !haskey(fields, fn)
+                false
+            elseif fn == "compiletimes" && !haskey(fields, fn)
+                # Old serialized Trials predate compile-time tracking; fill with zeros.
+                zeros(Float64, length(get(fields, "times", Float64[])))
             elseif fn in ("seconds", "overhead", "time_tolerance", "memory_tolerance") &&
                 fields[fn] === nothing
                 # JSON spec doesn't support Inf
diff --git a/src/trials.jl b/src/trials.jl
index ec5627a8..a3cdd176 100644
--- a/src/trials.jl
+++ b/src/trials.jl
@@ -6,27 +6,34 @@ mutable struct Trial
     params::Parameters
     times::Vector{Float64}
     gctimes::Vector{Float64}
+    compiletimes::Vector{Float64}
     memory::Int
     allocs::Int
 end
 
-Trial(params::Parameters) = Trial(params, Float64[], Float64[], typemax(Int), typemax(Int))
+Trial(params::Parameters) = Trial(params, Float64[], Float64[], Float64[], typemax(Int), typemax(Int))
+
+# Backward-compatible 5-arg constructor (pre-compile-time-tracking).
+Trial(params::Parameters, times::AbstractVector, gctimes::AbstractVector, memory::Integer, allocs::Integer) =
+    Trial(params, times, gctimes, zeros(Float64, length(times)), memory, allocs)
 
 function Base.:(==)(a::Trial, b::Trial)
     return a.params == b.params &&
            a.times == b.times &&
            a.gctimes == b.gctimes &&
+           a.compiletimes == b.compiletimes &&
            a.memory == b.memory &&
            a.allocs == b.allocs
 end
 
 function Base.copy(t::Trial)
-    return Trial(copy(t.params), copy(t.times), copy(t.gctimes), t.memory, t.allocs)
+    return Trial(copy(t.params), copy(t.times), copy(t.gctimes), copy(t.compiletimes), t.memory, t.allocs)
 end
 
-function Base.push!(t::Trial, time, gctime, memory, allocs)
+function Base.push!(t::Trial, time, gctime, memory, allocs, compiletime=0.0)
     push!(t.times, time)
     push!(t.gctimes, gctime)
+    push!(t.compiletimes, compiletime)
     memory < t.memory && (t.memory = memory)
     allocs < t.allocs && (t.allocs = allocs)
     return t
@@ -35,20 +42,22 @@ end
 function Base.deleteat!(t::Trial, i)
     deleteat!(t.times, i)
     deleteat!(t.gctimes, i)
+    deleteat!(t.compiletimes, i)
     return t
 end
 
 Base.length(t::Trial) = length(t.times)
 function Base.getindex(t::Trial, i::Number)
-    return push!(Trial(t.params), t.times[i], t.gctimes[i], t.memory, t.allocs)
+    return push!(Trial(t.params), t.times[i], t.gctimes[i], t.memory, t.allocs, t.compiletimes[i])
 end
-Base.getindex(t::Trial, i) = Trial(t.params, t.times[i], t.gctimes[i], t.memory, t.allocs)
+Base.getindex(t::Trial, i) = Trial(t.params, t.times[i], t.gctimes[i], t.compiletimes[i], t.memory, t.allocs)
 Base.lastindex(t::Trial) = length(t)
 
 function Base.sort!(t::Trial)
     inds = sortperm(t.times)
     t.times = t.times[inds]
     t.gctimes = t.gctimes[inds]
+    t.compiletimes = t.compiletimes[inds]
     return t
 end
 
@@ -56,6 +65,7 @@ Base.sort(t::Trial) = sort!(copy(t))
 
 Base.time(t::Trial) = time(minimum(t))
 gctime(t::Trial) = gctime(minimum(t))
+compiletime(t::Trial) = length(t.compiletimes) == 0 ? 0.0 : minimum(t.compiletimes)
 memory(t::Trial) = t.memory
 allocs(t::Trial) = t.allocs
 params(t::Trial) = t.params
@@ -575,7 +585,23 @@ function Base.show(@nospecialize(io::IO), ::MIME"text/plain", t::Trial)
     print(io, ", allocs estimate")
     printstyled(io, ": "; color=:light_black)
     printstyled(io, allocsstr; color=:yellow)
-    return print(io, ".")
+    print(io, ".")
+
+    # Compile-time info (only shown when compilation benchmarking was enabled)
+    if t.params.compilation && !isempty(t.compiletimes)
+        minct = minimum(t.compiletimes)
+        medct = median(t.compiletimes)
+        maxct = maximum(t.compiletimes)
+        print(io, "\n ")
+        printstyled(io, "Compile"; color=:yellow, bold=true)
+        printstyled(io, " (min … median … max): "; color=:light_black)
+        printstyled(io, prettytime(minct); color=:yellow)
+        print(io, " … ")
+        printstyled(io, prettytime(medct); color=:yellow, bold=true)
+        print(io, " … ")
+        printstyled(io, prettytime(maxct); color=:yellow)
+    end
+    return nothing
 end
 
 function Base.show(@nospecialize(io::IO), ::MIME"text/plain", t::TrialEstimate)
diff --git a/test/ExecutionTests.jl b/test/ExecutionTests.jl
index e78f0791..f5f0082c 100644
--- a/test/ExecutionTests.jl
+++ b/test/ExecutionTests.jl
@@ -405,7 +405,7 @@ GC.gc()
 # Ensure the harness itself doesn't allocate for a zero-allocation benchmark
 let b = @benchmarkable sin($(1))
     tune!(b)
-    sample_ref = Ref{BenchmarkTools.SampleResult}((0.0, 0.0, 0, 0))
+    sample_ref = Ref{BenchmarkTools.SampleResult}((0.0, 0.0, 0, 0, 0.0))
     b.samplefunc(b.quote_vals, b.params, sample_ref, nothing)
     s = sample_ref[]
     @test s[4] == 0  # allocs
@@ -420,4 +420,26 @@ end
 @test_throws MethodError ratio()
 @test_throws MethodError judge()
 
+#########################
+# compilation=true mode #
+#########################
+# Benchmark compilation time by invalidating the target call's entry-point
+# MethodInstance between samples so each sample forces a fresh codegen.
+if isdefined(Base, :invalidate_calls)
+    compile_bench_target(x) = sum(abs2, x) + prod(x .+ one(eltype(x)))
+    # Warm up to ensure we exercise the already-compiled path.
+    compile_bench_target(rand(8))
+    let t = @benchmark compile_bench_target($(rand(8))) samples=5 seconds=30 compilation=true evals=1
+        @test length(t) >= 2
+        @test t.params.compilation
+        @test t.params.evals == 1
+        @test length(t.compiletimes) == length(t.times)
+        # At least one sample should observe non-zero compile time. Allow a
+        # generous tolerance since the underlying invalidation + codegen is
+        # stochastic across the harness.
+        @test any(>(0), t.compiletimes)
+        @test compiletime(t) >= 0.0
+    end
+end
+
 end # module