Skip to content

Speed up gettext: linear template merge, single POT read, plural-info dedup, and runtime interpolation fast paths#2

Open
oliver-kriska wants to merge 5 commits into
extract-from-attributesfrom
extraction-merge-perf
Open

Speed up gettext: linear template merge, single POT read, plural-info dedup, and runtime interpolation fast paths#2
oliver-kriska wants to merge 5 commits into
extract-from-attributesfrom
extraction-merge-perf

Conversation

@oliver-kriska

@oliver-kriska oliver-kriska commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Five small, output-preserving optimizations across the extraction/compile path and the runtime interpolation path, stacked on top of the --from-attributes work (extract-from-attributes). Each change is output-equivalent: the extraction changes (1-3) were verified to produce byte-identical PO/POT output against a production app's catalog (~1,875 files, 11 domains × 16 locales, ~4,700 msgids), and the runtime changes (4-5) are covered by the existing interpolation test suite.

This PR targets the fork's extract-from-attributes branch so the diff shows only the new commits. Opened to review the diff and dogfood it in CI/dev before considering an upstream PR.

Commits

  1. Make POT template merge linear instead of O(n*m) - merge_template/3 matched messages with Expo.Messages.find/2 (a linear scan) once per existing message and once per new message, so it was O(N*M) per domain on every extract (and again in prune_unmerged/2). It now indexes both sides by Expo.Message.key/1 and looks matches up in a map / MapSet. Message.key/1 is exactly what Message.same?/2 (hence Messages.find/2) compares on, and Map.put_new/3 preserves Enum.find/2's first-match-wins, so output is unchanged.

  2. Avoid reading each existing POT file twice - read_contents_and_parse/1 did File.read!/1 and then PO.parse_file!/2, which reads the same file again. PO.parse_file!(path, file: path) is defined as File.read + parse_string(contents, file: path), so feeding the bytes we already hold to PO.parse_string!/2 is identical; File.read!/1 still guards a missing file.

  3. Compute plural info once per PO file - compile_po_file/5 called Plural.plural_info/3 twice (in nplurals/3 and compile_plural_forms/4). That function runs Code.ensure_compiled!/1 and parses the Plural-Forms header. It is computed once now; the generated code is identical.

  4. Stop recompiling the interpolation patterns on every call - to_interpolatable/1 rebuilt :binary.compile_pattern/1 for "%{" and "}" on every invocation. This runs on every runtime interpolation, including the common case where the current locale has no translation for a msgid and falls back to interpolating the msgid itself. It now matches on the literal "%{" / "}" patterns directly in :binary.split/2; the split result is identical, and these two-byte patterns gain nothing from precompilation (and a compiled pattern can't be hoisted to a module attribute - it's a reference).

  5. Skip String.Chars dispatch for binary bindings - the runtime interpolation path always ran binding values through to_string/1, dispatching the String.Chars protocol even when the value is already a binary (the common case). An is_binary/1 clause now uses the value directly; to_string/1 on a binary returns it unchanged, so the result is identical.

Benchmark (commit 1)

The merge runs in both the stock and --from-attributes paths, so this win applies regardless of the flag. This script isolates the matching cost (the merge step is reduced to identity so both implementations do the same number of calls) and asserts identical result counts. Run it with mix run, optionally setting BENCH_POT to a real .pot/.po path to add a real-catalog case:

alias Expo.{Message, Messages, PO}

defmodule OldMerge do
  def run(existing, new) do
    old_and_merged =
      Enum.flat_map(existing.messages, fn message ->
        cond do
          same = Messages.find(new, message) -> [keep(message, same)]
          true -> [message]
        end
      end)

    old_and_merged ++ Enum.reject(new.messages, &Messages.find(existing, &1))
  end

  defp keep(old, _new), do: old
end

defmodule NewMerge do
  def run(existing, new) do
    new_by_key =
      Enum.reduce(new.messages, %{}, fn message, acc ->
        Map.put_new(acc, Message.key(message), message)
      end)

    existing_keys = MapSet.new(existing.messages, &Message.key/1)

    old_and_merged =
      Enum.flat_map(existing.messages, fn message ->
        cond do
          same = Map.get(new_by_key, Message.key(message)) -> [keep(message, same)]
          true -> [message]
        end
      end)

    old_and_merged ++
      Enum.reject(new.messages, &MapSet.member?(existing_keys, Message.key(&1)))
  end

  defp keep(old, _new), do: old
end

time = fn iters, fun ->
  fun.()
  {us, _} = :timer.tc(fn -> Enum.each(1..iters, fn _ -> fun.() end) end)
  us / iters / 1000.0
end

synthetic = fn n ->
  %Messages{messages: for(i <- 1..n, do: %Message.Singular{msgid: ["msgid number #{i}"], msgstr: [""]})}
end

bench = fn label, existing, new, iters ->
  old_ms = time.(iters, fn -> OldMerge.run(existing, new) end)
  new_ms = time.(iters, fn -> NewMerge.run(existing, new) end)
  equal? = length(OldMerge.run(existing, new)) == length(NewMerge.run(existing, new))
  IO.puts("#{String.pad_trailing(label, 22)} old=#{Float.round(old_ms, 2)} ms  " <>
            "new=#{Float.round(new_ms, 3)} ms  equal_count=#{equal?}")
end

case System.get_env("BENCH_POT") do
  nil -> :ok
  path ->
    pot = PO.parse_file!(Path.expand(path))
    bench.("real (all match)", pot, pot, 20)
    fresh = %Messages{messages: for(m <- pot.messages, do: %{m | msgid: ["NEW " | m.msgid]})}
    bench.("real (all new)", pot, fresh, 20)
end

for n <- [500, 1000, 2000, 4000, 8000], do: bench.("n=#{n}", synthetic.(n), synthetic.(n), max(3, div(40_000, n)))

Results on a real default.pot (~4,200 msgids), 3 fresh-VM runs, warmup + 20-iteration average, equal_count true throughout:

input old new
real catalog, all match (steady-state re-extract) ~1.0 s ~3 ms
real catalog, all new (worst case) ~3.7 s ~3 ms
synthetic n=500 / 1k / 2k / 4k / 8k 14 / 51 / 230 / 800 / 3235 ms 0.23 / 0.46 / 1.0 / 2.1 / 4.3 ms

The old time quadruples per doubling (O(n²)); the new one only doubles (O(n)). End-to-end, the stock (no-flag) extract is dominated by the force-recompile, so the merge saving is small there in absolute terms but grows with catalog size; on the recompile-free --from-attributes path it is proportionally visible.

Validation

  • Full mix test is green (the only failures are the pre-existing order-dependent gettext.extract_test.exs cases under Elixir 1.20+, present on the base branch as well).
  • Against a large production app: it recompiles cleanly through the changed codegen, and mix gettext.extract + mix gettext.merge leave the committed priv/gettext tree byte-identical and idempotent.

@stage-review

stage-review Bot commented Jun 16, 2026

Copy link
Copy Markdown

Ready to review this PR? Stage has broken it down into 5 individual chapters for you:

Title
1 Optimize POT template merge to linear time
2 Avoid redundant file reads during extraction
3 Deduplicate plural info computation during compilation
4 Optimize runtime interpolation pattern matching
5 Fast-path binary values in interpolation
Open in Stage

Chapters generated by Stage for commit 574dcba on Jun 16, 2026 3:47pm UTC.

@oliver-kriska oliver-kriska force-pushed the extraction-merge-perf branch from d0d2e15 to 4e11de9 Compare June 16, 2026 08:14
@oliver-kriska oliver-kriska changed the title Extraction/merge compile-time perf bundle (linear merge, single read, plural dedup) Speed up mix gettext.extract: linear template merge, single POT read, one plural-info computation Jun 16, 2026
@oliver-kriska oliver-kriska force-pushed the extraction-merge-perf branch from 4e11de9 to 0eb3655 Compare June 16, 2026 08:26
@oliver-kriska oliver-kriska changed the title Speed up mix gettext.extract: linear template merge, single POT read, one plural-info computation Speed up gettext: linear template merge, single POT read, plural-info dedup, and runtime interpolation fast paths Jun 16, 2026
@oliver-kriska oliver-kriska force-pushed the extract-from-attributes branch from d27c40c to abd4bf7 Compare June 16, 2026 12:58
@oliver-kriska oliver-kriska force-pushed the extraction-merge-perf branch from 6ae2799 to bc57759 Compare June 16, 2026 13:07
@oliver-kriska oliver-kriska force-pushed the extract-from-attributes branch from abd4bf7 to 11ae3cd Compare June 16, 2026 15:33
@oliver-kriska oliver-kriska force-pushed the extraction-merge-perf branch from bc57759 to 066379a Compare June 16, 2026 15:33
`Gettext.Extractor.merge_template/3` matched messages between the existing
and newly extracted templates with `Expo.Messages.find/2`, a linear scan
run once per existing message and once per new message. For a domain with
N existing and M extracted messages this is O(N*M) comparisons, and it runs
on every `mix gettext.extract` (and again in `prune_unmerged/2`), dominating
extraction time on large catalogs.

Index both sides by `Expo.Message.key/1` first, then look matches up in a
map / `MapSet`. `Message.key/1` is exactly what `Message.same?/2` (and thus
`Messages.find/2`) compares on, so the result is unchanged; `Map.put_new/3`
preserves the first-match-wins behavior of `Enum.find/2`. Output is
byte-identical, verified against a large production app's catalog.
`read_contents_and_parse/1` read the file with `File.read!/1` and then called
`PO.parse_file!/2`, which reads the very same file again internally. Pass the
contents we already have to `PO.parse_string!/2` instead.

`PO.parse_file!(path, file: path)` is defined as `File.read` followed by
`parse_string(contents, file: path)`, so the parsed result is identical; the
existing `File.read!/1` still raises on a missing file. This removes one disk
read per existing POT file on every `mix gettext.extract`.
`compile_po_file/5` derived the plural data twice: once in `nplurals/3` and
once in `compile_plural_forms/4`, each calling `Plural.plural_info/3`. That
function runs `Code.ensure_compiled!/1` and parses the `Plural-Forms` header,
so it is needless work repeated for every PO file across all locales and
domains.

Compute `plural_info` once in `compile_po_file/5` and pass it to both helpers.
The generated code is identical (the same value is escaped into the plural
dispatcher and fed to `nplurals/1`); only the duplicate computation is removed.
`to_interpolatable/1` called `:binary.compile_pattern/1` for both `"%{"` and
`"}"` on every invocation, then threaded the results through the recursion.
This runs on every runtime interpolation, including the common case where the
current locale has no translation for a msgid and falls back to interpolating
the msgid itself.

Match on the literal `"%{"` / `"}"` patterns directly in `:binary.split/2`
instead. The split result is identical, and these two-byte patterns gain
nothing from being precompiled, so this just removes the per-call work (a
compiled pattern can't be hoisted to a module attribute - it's a reference).
In the runtime interpolation path, binding values were always run through
`to_string/1`, dispatching the `String.Chars` protocol even when the value is
already a binary - the common case (names, labels, preformatted strings).

Add an `is_binary(value)` clause that uses the value directly. `to_string/1`
on a binary returns it unchanged, so the result is identical; this only skips
the protocol dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant