Implement flat-first archiving for Action reports to improve limiting and memory consumption#24191
Open
Implement flat-first archiving for Action reports to improve limiting and memory consumption#24191
Conversation
f4c6778 to
a5ef0f1
Compare
This was referenced Mar 13, 2026
82902eb to
f4fca88
Compare
10 tasks
aafbd67 to
742c6b4
Compare
nathangavin
reviewed
Mar 24, 2026
Contributor
nathangavin
left a comment
There was a problem hiding this comment.
It generally looks good to me. With regards to testing, would it be feasible to directly compare legacy hierarchy aggregation to flat aggregation to determine that we are getting the same output?
Other than that I don't see any issues.
b9ebb58 to
dc0aba8
Compare
…or aggregating higher periods with mixed tables (flat + hierachical)
…+ new flat tables)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces flat-first archiving for Action reports, controlled by a new config flag:
datatable_archiving_maximum_rows_actions_flatEnable / Disable
0or not set> 0A useful initial value is
50000, which matches the current query/report row limit.Archiving approach changes
With flat-first enabled:
merged so re-archiving everything is not required.
Limits and
Othersbehaviordatatable_archiving_maximum_rows_actions_flat.Othersis determined during flat limiting, so behavior is more consistent and avoids extra secondaryOtherscreatedby hierarchical re-limiting.
Impact on memory and archiving time
For high-cardinality Action data (many distinct URLs, changing month-to-month), this reduces resource pressure by
avoiding repeated deep hierarchical merges during period aggregation.
Testing
The changes contain a couple of tests to proof that flat-first archiving in general works. However, all tests were run with flat-archiving enabled, to ensure no other tests regress unexpectedly. Most of those failures were fixed by applying unrelated fixes (see linked PRs).
There is currently one mismatch in tests remaining. See https://github.com/matomo-org/matomo/actions/runs/23338988295
URL Metadata / Segment Mismatch Note
There is an existing inconsistency in how Matomo merges duplicate rows with the same label but different metadata (for
example URL rows where one variant has different segment metadata, empty segment metadata, or different include-depth
behavior like include2 vs include4).
With flat-first archiving, this can become more visible because row merging happens through a slightly different path
than pure legacy hierarchy aggregation. The underlying issue is not specific to flat-first itself: when multiple
candidate rows exist, metadata winner selection is not fully deterministic/explicit in all merge paths.
Practical impact:
order/source.
during row aggregation.
Checklist
Review