Add recompile-free extraction via persisted module attributes#437
Add recompile-free extraction via persisted module attributes#437oliver-kriska wants to merge 4 commits into
Conversation
|
Do we even need the module attribute? Maybe instead we could simply start the Agent on demand during compilation with the existing mode? Then later we can swap it to use ETS or whatever. |
|
but how will we persist cross compilation? So it would mean we will have to still do force compile which can be pretty expensive. So basically your suggested solution is how it works now if I'm not wrong, with that add_message function. |
|
Oh, sorry, I have completely misread it. I see. The reason I don't like the module attributes is that, IIRC, they are loaded once the code is loaded, and that's also a runtime cost. Another option would be to track additional files per module? And then we prune those files when assembing the .po if they don't have an equivalent |
|
@josevalim What do you think about just disabling those attributes for I like the approach with module attributes because you have the same incremental compilation as normal and you can extract at any point by enumerating the modules / their attributes. You could for example extract while there's a running phoenix dev server. |
|
it should be already there, see those lines defp extraction_environment? do
cond do
not Code.ensure_loaded?(Mix) ->
false
is_nil(Mix.Project.get()) ->
false
true ->
gettext_config = Mix.Project.config()[:gettext] || []
extraction_environments =
gettext_config[:extraction_environments] ||
Application.get_env(:gettext, :extraction_environments) ||
[:dev]
Mix.env() in extraction_environments
end
end |
| Mix.Task.reenable("compile.elixir") | ||
| Mix.Task.reenable("compile.app") | ||
| Mix.Task.run("compile", []) | ||
| Mix.Task.run("compile.elixir", []) |
There was a problem hiding this comment.
I don't understand why we need all of this. We can compile and assume that already puts everything in place? Or it may be better if we use separate tasks maybe? mix gettext.generate or something?
There was a problem hiding this comment.
make sense, trying something with mix gettext.extract_from_attributes it's too long but .generate is "too wide meaning". Open for other ideas
There was a problem hiding this comment.
Do we want to maintain two extraction features in parallel? I trust that this approach will be stable.
There was a problem hiding this comment.
I’d maintain both for now. This package is likely a decade old, this is an essential part, id not necessarily assume we won’t have blockers and regressions. We can merge them together later.
|
Ok, agreed then. I would probably make it like this though: So we are not accessing It may also be worth treating extraction with and without this feature as completely different things? They can share some steps (such as the merging and fusing) but it may be clearer to treat them as separate code paths execution wise. |
…utes task Following Jose's review on PR elixir-gettext#437, split the attribute-based extraction out of the --from-attributes flag into a dedicated mix gettext.extract_from_attributes task. The two extraction modes are now separate code paths that share only the POT merge/fuse and output steps via Mix.Tasks.Gettext.Extract.process/3. Also drop the incremental_compile/0 reenable dance. The new task does a plain mix compile (a no-op when the project is already compiled), since by the time it reads the persisted attributes the project is already compiled and up to date.
d27c40c to
abd4bf7
Compare
mix gettext.extract can only discover messages by force-recompiling the whole project, because extraction happens during macro expansion and the results only exist in the extractor agent for the duration of that compilation. On large projects this dominates extraction time. Implement the design proposed by @maennchen in elixir-gettext#373: during normal compilation, when the current Mix environment is included in the new :extraction_environments gettext configuration (default [:dev]), the gettext macros persist each message into an accumulated, persisted @__gettext_messages__ attribute of the calling module, and backends persist a @__gettext_backend_module__ marker. A new experimental --from-attributes flag on mix gettext.extract then runs a normal incremental compile, reads the persisted entries back from the compiled BEAM files with :beam_lib (without loading modules), feeds them into the existing extractor agent, and reuses the unchanged Gettext.Extractor.pot_files/2 merge logic. Staleness needs no bookkeeping: changed files are recompiled by the incremental compile and get fresh attributes, and deleted files' beams are pruned by the compiler. --check-up-to-date composes with the new flag and gets the same speedup. Environments not listed in :extraction_environments (such as :prod) persist nothing, so release artifacts are unaffected. If the scan finds no persisted messages or backends at all, the task raises with instructions instead of silently producing empty output. The force-recompile path is untouched and remains the default. Old-path and new-path POT output is byte-identical, covered by tests including macro-generated messages (gettext calls injected into modules that do not literally mention them) and plural/context/comment/noop variants.
Address review findings on the --from-attributes prototype:
* Validate the shape of persisted message entries when scanning BEAM
files, skipping malformed ones instead of crashing the scan with a
CaseClauseError (an unrelated module could carry an attribute with
the same name, and future format changes need a safe path).
* Reenable "compile" and friends in the incremental compile, so that
extraction still picks up source changes when the compile tasks
already ran in the current VM (for example, through a task alias).
* Guard the backend marker registration with Module.has_attribute?/2,
matching persist_message/6.
Test improvements: assert fixed content on the rebuilt POT (not only
equality with the recompilation path), reset the extractor agent's full
initial state between tests, make reference line assertions robust to
fixture edits, pin the stale-POT filename in the --check-up-to-date
assertion, and restore Code.compiler_options/1 after the suite.
New coverage: _with_backend macro variants, reference merging when two
modules share a msgid, --from-attributes combined with --merge, custom
:priv backends, and extraction across umbrella apps (each app gets its
own POT through the task's recursion).
…utes task Following Jose's review on PR elixir-gettext#437, split the attribute-based extraction out of the --from-attributes flag into a dedicated mix gettext.extract_from_attributes task. The two extraction modes are now separate code paths that share only the POT merge/fuse and output steps via Mix.Tasks.Gettext.Extract.process/3. Also drop the incremental_compile/0 reenable dance. The new task does a plain mix compile (a no-op when the project is already compiled), since by the time it reads the persisted attributes the project is already compiled and up to date.
abd4bf7 to
11ae3cd
Compare
Replace the Mix-environment gate (:extraction_environments) with a
per-backend application-environment toggle, read without touching Mix:
config :gettext, MyApp.Gettext, automatic_extraction: true
Reading from the application environment instead of Mix.Project/Mix.env
keeps Mix out of the compile-time macro path, is off by default, and is
off outside of Mix (such as runtime compilation in a release), so prod
beams stay free of the persisted attributes.
The toggle is keyed under the :gettext application (not the backend's own
otp_app) so the persist gate never calls the backend at a gettext call
site, avoiding a compile-time dependency that would recompile every
gettext caller whenever PO files change.
The persisted data, BEAM reading, and POT merging are unchanged, so the
--from-attributes output is byte-identical to the recompilation path.
…utes task Following Jose's review on PR elixir-gettext#437, split the attribute-based extraction out of the --from-attributes flag into a dedicated mix gettext.extract_from_attributes task. The two extraction modes are now separate code paths that share only the POT merge/fuse and output steps via Mix.Tasks.Gettext.Extract.process/3. Also drop the incremental_compile/0 reenable dance. The new task does a plain mix compile (a no-op when the project is already compiled), since by the time it reads the persisted attributes the project is already compiled and up to date.
11ae3cd to
c357fca
Compare
|
fyi I updated code but I want to properly test it with our big translations, but feedback welcome. Also with this oliver-kriska#2 |
Claude Code generated text:
Motivation
mix gettext.extractdiscovers messages by force-recompiling the whole project, because extraction happens during gettext macro expansion and the results only live in the extractor agent for that single compilation. On largeprojects this dominates extraction time, and because
--check-up-to-dateruns in CI on every change, the cost is paid constantly.This implements the recompile-free design @maennchen proposed in #373 (#373 (comment)). That issue was closed once the immediate duplicate-reference / module redefinition bug was fixed in v1.0.2, but the extraction redesign itself was never implemented.
Approach
Two halves:
Mix.env()is in the new:extraction_environmentsconfig (default[:dev]), each gettext macro persists its message into an accumulated,persist: truemodule attribute@__gettext_messages__on the calling module. Backends additionally persist a@__gettext_backend_module__marker.mix gettext.extract --from-attributesruns a normal incremental compile, then reads those attributes back from the compiled BEAM files with:beam_lib.chunks(beam, [:attributes])(without loading the modules), feeds them into the existingGettext.ExtractorAgent, and reuses the unchangedGettext.Extractor.pot_files/2merge logic.The force-recompile path is untouched and remains the default.
Why it stays correct
--check-up-to-datecomposes with--from-attributesand gets the same speedup.:extraction_environments(notably:prod) persist nothing — no attributes, no marker. Unit-tested (a fixture compiled in:testcarries no gettext attributes on any BEAM).mix compile --forcerather than silently pruning every autogenerated message or silently falling back to a surprise full recompile.Benchmarks
Large production Phoenix app (1,875 files, 11 domains × 16 locales, ~4,700 msgids; Elixir 1.20 / OTP 29):
--from-attributesgettext.extract --mergegettext.extract --check-up-to-date(CI)In that app's CI the extraction step dropped from ~66–94s to ~4–11s and has run on every merge for several days with identical POT output. Dev BEAM size grew ~2.7% (≈ +2.3 MiB for ~4,700 msgids); prod is unaffected.
Scope / limitations
--from-attributesand set:extraction_environments.Gettext.Macrosbackends only. The deprecateduse Gettext, otp_app: ...backends generate their own*_noopmacros and don't persist attributes; the zero-found error catches a legacy-only project that opts in.extraction_apps(cross-app dependency extraction) deferred.gettext.extractis@recursive, so each umbrella child already scans its own compile path (covered by a fixture test); extracting messages that livein dependencies is new functionality for a follow-up.
Design notes for review
:beam_lib, rather than the@before_compile__translations__/0function from the original sketch: same data, no function injected into user modules, and reading attributes doesn't load the modules(so scanning 1,000+ BEAMs stays cheap and side-effect-free). Happy to switch to the generated-function approach if you prefer it.
ExtractorAgentrather than refactoringpot_files/2, so duplicate-message merging, plural-mismatch warnings and comment de-duplication are reused verbatim — equivalence is structural, not re-implemented.Tests
New
test/mix/tasks/gettext.extract_from_attributes_test.exscovers: old-path / new-path byte-equivalence (fresh + idempotent), macro-generated messages (gettext calls injected into modules that don't textually contain them),_with_backendvariants, reference merging across modules sharing a msgid,--from-attributes --merge, custom:priv,--check-up-to-datestaleness via the incremental compile, umbrella extraction, and prod-clean BEAMs.