Skip to content

Performance Testing Infrastructure Bits#12365

Open
poikilotherm wants to merge 23 commits intodevelopfrom
11405-optimize-huge-exports
Open

Performance Testing Infrastructure Bits#12365
poikilotherm wants to merge 23 commits intodevelopfrom
11405-optimize-huge-exports

Conversation

@poikilotherm
Copy link
Copy Markdown
Contributor

@poikilotherm poikilotherm commented Apr 28, 2026

What this PR does / why we need it:

In the current codebase, there are no ways to create larger object graphs for datasets, which would allow for performance testing.
Many problematic code pieces and design choices only become visible as problematic once you start hitting them with larger quantities, in isolation and controlled circumstances. Many small losses, that still add up when scaling to more users are not detected by classic tests.

This PR introduces:

  1. A fluent API to describe such object graphs in test code, build them as fixtures, wire together and populate them with primitive data.
  2. A way to bootstrap a fully controlled JPA entity manager with a proxy data source to analyze the ORM's SQL.
  3. Ways to run code with a real Postgres database, using the said entity manager.
  4. Documentation how to use and extend it.

Which issue(s) this PR closes:

None. It's an offspring to #11405 (related), but will not close it. It already uncovered #12362 (but is not a fix for it).

Special notes for your reviewer:
You should take a look at the documentation at the example tests, maybe play around some more with it. Use a profiler to capture more details, and so on.

Suggestions on how to test this:
It's a self-contained package with infrastructure and tests 😉

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope

Is there a release notes update needed for this change?:
Not sure. Do we mention these things?

Additional documentation:
None. 🔋 included.

…rt for tabular and standard files #11405

Introduced a new `DatasetFixture` structure and its builder API to simplify test dataset creation. This includes support for datasets, versions, files (tabular and standard), and associated metadata. Added unit tests and minimal populator implementation for scalar field initialization.
…rt with JpaTestBootstrap setup #11405

Introduced `HugeDatasetExportPerformanceIT` for testing large dataset export performance. Added `JpaTestBootstrap` to streamline test setup with Testcontainers and JPA. Updated dependencies to include `datasource-proxy` for query tracking and enhanced test tooling.

Note: this doesn't work yet, as we need a fixture generator first.
…o dedicated classes

Simplified method signatures and improved maintainability by decoupling builder-internal types from fixture populators. Updated references across the codebase to use the new context classes.
…setFixtureTest`

 Avoid measuring the speed of writing to the stream, while only the speed of converting the JSON in-memory model to a String is of interest.
…dularity and add robust variable metadata handling
…ureTest` setup

Otherwise the initialization will be counted towards the execution time of building the POJOs.
We are only interested in the smoke. We will create more distinct performance related tests.
…and `TermsOfUseAndAccess` in `MinimalPopulator`
…bject` properties in fixture builders (Dataset and DataFile)
…its integration across fixture builders and tests using a new recipe
@poikilotherm poikilotherm self-assigned this Apr 28, 2026
@poikilotherm poikilotherm added Type: Feature a feature request Feature: Metadata Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: Performance & Stability Feature: Developer Guide Component: Containers Anything related to cloudy Dataverse, shipped in containers. Size: 3 A percentage of a sprint. 2.1 hours. D: Dataset: large number of files https://github.com/IQSS/dataverse-pm/issues/27 labels Apr 28, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 28, 2026

Coverage Status

coverage: 25.053% (+0.09%) from 24.963% — 11405-optimize-huge-exports into develop

@github-actions

This comment has been minimized.

… and recipes

Provides comprehensive guides and examples for using the dataset fixture generator, recipes, and populators. Includes usage scenarios, architecture overview, persistence guidance, and extension recommendations to support test development.
@github-actions

This comment has been minimized.

… tests

This utility is replaced by a more sophisticated JUnit extension
…nit extension

Adds `JpaEntityManagerService` for managing JPA entity lifecycle and transaction operations in tests, the `JpaPerformanceTest` annotation to streamline performance test setup with Testcontainers, and the `JpaPerformanceTestExtension` to handle shared PostgreSQL container and database isolation. Updates `pom.xml` with required dependencies for these utilities.
…ceTest` utilities

Replaces `JpaTestBootstrap` and manual container management with the `JpaPerformanceTest` annotation and `JpaEntityManagerService`. Removes redundant `@AfterAll` cleanup logic.
Includes resource-local transaction management, entity scanning setup, and matching production-like JPA properties, but tailored for test scenarios.
…tup and best practices

Includes instructions for running performance tests, database-bound testing with Testcontainers, and using `JpaEntityManagerService`. Provides example test class, configuration details, and advanced usage for query profiling.
Adds support for Markdown files in Sphinx configuration. Upgrades `myst-parser` from `2.0.0` to `4.0.0` in requirements for compatibility.
@poikilotherm poikilotherm marked this pull request as ready for review May 2, 2026 00:48
@poikilotherm
Copy link
Copy Markdown
Contributor Author

I think this is ready for a review and inclusion in the codebase. It's not all done, but there is a reasonable amount of functionality, a whole bunch of documentation and even some first test using it. A solid base for further extension and especially for creating performance tests. Sending this to "ready for review"

@poikilotherm poikilotherm moved this from Important to Proposals in Forschungszentrum Jülich May 2, 2026
@poikilotherm poikilotherm moved this to Ready for Review ⏩ in IQSS Dataverse Project May 2, 2026
@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11405-optimize-huge-exports
ghcr.io/gdcc/configbaker:11405-optimize-huge-exports

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. D: Dataset: large number of files https://github.com/IQSS/dataverse-pm/issues/27 Feature: Developer Guide Feature: Metadata Feature: Performance & Stability Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request

Projects

Status: Proposals
Status: Ready for Review ⏩

Development

Successfully merging this pull request may close these issues.

2 participants