Skip to content

Add explicit index template mappings and reporting config wiring for OpenSearchMetricsSink#2569

Open
Shivani-techno wants to merge 12 commits intoopensearch-project:mainfrom
Shivani-techno:validations__scivani
Open

Add explicit index template mappings and reporting config wiring for OpenSearchMetricsSink#2569
Shivani-techno wants to merge 12 commits intoopensearch-project:mainfrom
Shivani-techno:validations__scivani

Conversation

@Shivani-techno
Copy link
Copy Markdown

Description

Adds explicit index template creation to OpenSearchMetricsSink so validation metrics
indices have consistent, optimized field mappings instead of relying on dynamic mapping.

Changes:

  • OpenSearchMetricsSink: Creates composable index template at startup with explicit
    types (keyword for aggregation fields, date for timestamp, nested for comparisons,
    text for request bodies with no length limit)
  • ReportingConfig: New YAML config parser for reporting framework settings
  • ShimMain: Added --reporting-config CLI parameter to enable validation reporting
  • ShimProxy/MultiTargetRoutingHandler: Wired MetricsReceiver into the request pipeline
    to collect and publish validation metrics after each request
  • docker-compose.validation.yml: Added reporting config mount for shim-solr-primary
  • reporting-config.yaml: Sample configuration with placeholder credentials

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@akshay2000
Copy link
Copy Markdown
Collaborator

Please fix the DCO failure.

shim-solr-primary:
<<: *shim-base
volumes:
- transform-dist:/transforms:ro
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change required?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The volume override is scoped to shim-solr-primary ( dual mode ) only, not the base. The transform-dist volume is re-declared because YAML merge replaces rather than appends — without it, the transform JS files wouldn't be mounted.

<<: *shim-base
volumes:
- transform-dist:/transforms:ro
- ./reporting-config.yaml:/config/reporting-config.yaml:ro
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting config makes sense only when we are running in dual mode. Adding it here will mount the config on all the shims - it will largely remain unused.

}

/** Simple YAML flattener — handles indentation-based nesting, strips comments. */
private static Map<String, String> flattenYaml(String content) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really consider using a YAML parsing library. YAML is not just indent based line parsing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Raising the revision to use parsing library.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 74.17840% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.67%. Comparing base (a9e040e) to head (108c361).
⚠️ Report is 280 commits behind head on main.

Files with missing lines Patch % Lines
...ransform/shim/reporting/OpenSearchMetricsSink.java 71.02% 21 Missing and 10 partials ⚠️
...opensearch/migrations/transform/shim/ShimMain.java 61.53% 8 Missing and 2 partials ⚠️
...ions/transform/shim/reporting/ReportingConfig.java 84.37% 2 Missing and 8 partials ⚠️
...ransform/shim/netty/MultiTargetRoutingHandler.java 71.42% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2569      +/-   ##
============================================
+ Coverage     72.09%   72.67%   +0.57%     
- Complexity       65       90      +25     
============================================
  Files           695      705      +10     
  Lines         32018    32728     +710     
  Branches       2714     2809      +95     
============================================
+ Hits          23084    23784     +700     
+ Misses         7694     7638      -56     
- Partials       1240     1306      +66     
Flag Coverage Δ
gradle 69.02% <74.17%> (+0.78%) ⬆️
node 92.55% <ø> (+0.04%) ⬆️
python 76.67% <ø> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 30, 2026 14:56 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 30, 2026 14:56 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 30, 2026 15:01 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 30, 2026 15:01 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 31, 2026 04:44 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 31, 2026 04:44 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 31, 2026 04:47 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 31, 2026 04:47 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 31, 2026 09:21 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval March 31, 2026 09:21 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 1, 2026 11:46 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 1, 2026 11:46 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 1, 2026 11:52 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 1, 2026 11:52 — with GitHub Actions Error
- discovery.type=single-node
- DISABLE_SECURITY_PLUGIN=true
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=Admin_1234!
- OPENSEARCH_JAVA_OPTS=-Xms256m -Xmx256m
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we need to modify the existing OS container definition?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because running two OpenSearch instances (destination + reporting) caused the destination to get OOM killed. But no need to do for the destination, so fixed it by removing the heap limit from the destination OS container and set the reduced heap on the reporting node.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shivani-techno take a look at your config in docker desktop for cpu/memory allocated to the docker runner, it may have caused the OOM you were seeing

- "18080:8080"
volumes:
- transform-dist:/transforms:ro
- ../TrafficCapture/SolrTransformations/docker/reporting-config.yaml:/config/reporting-config.yaml:ro
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a reporting config for this harness specifically. The one in the other directory should serve only as a template.

- opensearch=request:/transforms/solr-to-opensearch-request.js,response:/transforms/solr-to-opensearch-response.js
- --watchTransforms

# Mode 3: Dual-target, Solr primary — returns Solr response, validates against OpenSearch
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are modifying the test harness (aka. dev sandbox) we should not modify this file at all.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, reverted back the changes

@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 2, 2026 04:41 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 3, 2026 08:44 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 3, 2026 08:44 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 3, 2026 08:50 — with GitHub Actions Error
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 3, 2026 08:50 — with GitHub Actions Error
…porting node and Dashboards

Signed-off-by: Shivani - <scivani@amazon.com>
@Shivani-techno Shivani-techno had a problem deploying to migrations-cicd-require-approval April 3, 2026 09:18 — with GitHub Actions Error
retries: 30

opensearch-reporting:
image: opensearchproject/opensearch:3.3.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this 3.5

- discovery.type=single-node
- DISABLE_SECURITY_PLUGIN=true
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=Admin_1234!
- OPENSEARCH_JAVA_OPTS=-Xms256m -Xmx256m
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump this to at least 1gb on Xmx, we've had issues with smaller opensearch clusters

Copy link
Copy Markdown
Member

@AndreKurait AndreKurait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the tuple interface in the replayer. See #2605 which can be leveraged for high performance validation here.

* - headers use dynamic mapping since HTTP header values are always strings
* and individual headers vary per request
*/
private String buildIndexTemplateJson() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to preflight step prior to the shim starting up

private String buildIndexTemplateJson() {
return String.format("""
{
"index_patterns": ["%s-*"],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test in here that validates if this works against the reporting opensearch version?

if (buffer.size() >= bulkSize) {
List<ValidationDocument> batch = new ArrayList<>(buffer);
buffer.clear();
scheduler.execute(() -> sendBulk(batch));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a bottleneck by single threading submit and waiting on the bulk request.

try {
synchronized (buffer) {
buffer.add(document);
if (buffer.size() >= bulkSize) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on number of documents without consideration for their size is troublesome. E.g. if each doc was 20MB this would be a 2GB request which opensearch would reject

Signed-off-by: Shivani - <scivani@amazon.com>
Copy link
Copy Markdown
Member

@AndreKurait AndreKurait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment serves so i can use "Changes since reivew" post force-push's

Shivani - added 5 commits April 3, 2026 22:07
…emplate, improve coverage

Signed-off-by: Shivani - <scivani@amazon.com>
Signed-off-by: Shivani - <scivani@amazon.com>
Signed-off-by: Shivani - <scivani@amazon.com>
Signed-off-by: Shivani - <scivani@amazon.com>
Signed-off-by: Shivani - <scivani@amazon.com>
public class OpenSearchMetricsSink implements MetricsSink {

private static final Logger log = LoggerFactory.getLogger(OpenSearchMetricsSink.class);
private static final ObjectMapper MAPPER = new ObjectMapper();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a mapper factory we should use

}
}

private long estimateDocSize(ValidationDocument document) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we have existing code in this project to do this without reserializing. See how RFS generates bulk documents

ndjson.append(MAPPER.writeValueAsString(doc)).append("\n");
}

var requestBuilder = HttpRequest.newBuilder()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have existing code that does this in a robust way (e.g. applying GZIP client compression) ideally we would use the same underlying code for performance and maintainability

.timeout(Duration.ofSeconds(30));

if (authHeader != null) {
requestBuilder.header("Authorization", authHeader);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this only supports basic auth, if you used our existing construcs in replay or RFS this could support sigv4

* Checks for partial failures in bulk response.
* Even if HTTP status is 200, individual documents may have failed.
*/
void checkPartialFailures(String responseBody, int totalDocs) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have exisitng logic in RFS which does this as well as a bisect on failure to only retry the failed docs, could be useful here

public boolean watchTransforms;

@Parameter(names = {"--reporting-config"},
description = "Path to YAML configuration file for the validation reporting framework.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we getting value for the complexity of YAML here when the rest of the processes support JSON natively?

* Validates index template creation, document indexing, and mapping against a real OpenSearch instance.
*/
@Testcontainers
@Tag("longTest")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use isolatedTest for testcontainers

.connectTimeout(Duration.ofSeconds(10)).build();

@Container
static final OpensearchContainer<?> opensearch = new OpensearchContainer<>("opensearchproject/opensearch:2.19.1")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see our helper methods for test containers which we use across our processes. Why is this using opensearch 2.19?

…config to JSON

Signed-off-by: Shivani - <scivani@amazon.com>
@jugal-chauhan
Copy link
Copy Markdown
Collaborator

Checking in, are there more changes expected here ? Should we move this PR into draft until these changes are made ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants