Fail fast when tool is missing input_schema in function calling by ohltyler · Pull Request #4779 · opensearch-project/ml-commons

ohltyler · 2026-04-04T16:28:48Z

Description

When a conversational agent builds the function calling payload (e.g., for Bedrock Converse or OpenAI), each tool must have an input_schema attribute so the LLM knows what parameters to pass. If input_schema is missing, the ${tool.attributes.input_schema} placeholder is left unresolved, producing invalid JSON that the LLM provider rejects with an opaque Invalid payload error.

Fix

Throw a clear IllegalArgumentException in AgentUtils.addToolsToFunctionCalling() when a tool is missing input_schema, naming the tool and the missing attribute. This fails fast with an actionable error message instead of producing silent failures.

Tools should define input_schema either:

In the tool implementation (as built-in ml-commons tools and skills plugin tools already do)
At agent registration time via the tool's attributes override

Testing

Unit tests for both cases: null attributes and non-null attributes missing input_schema — both verify the exception is thrown with the correct tool name and attribute name
Existing testAddToolsToFunctionCalling updated to include input_schema in mock tool attributes

Issues Resolved

N/A - discovered during investigation of VectorDBTool support for agentic search

github-actions · 2026-04-04T16:30:00Z

PR Reviewer Guide 🔍

(Review updated until commit `3fa81f5`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Breaking Change The new validation throws an `IllegalArgumentException` when `input_schema` is missing from tool attributes. This is a breaking change for any existing tools (e.g., built-in ml-commons tools or third-party tools) that do not currently define `input_schema` in their attributes. It should be verified that ALL tools used in function calling contexts already have `input_schema` defined, otherwise this will break existing deployments silently at runtime. if (attributes == null \|\| !attributes.containsKey(INPUT_SCHEMA)) { throw new IllegalArgumentException( "Tool [" + toolName + "] is missing [" + INPUT_SCHEMA + "] in its attributes. " + "All tools used with function calling must define an input_schema." ); } Redundant Check After the early-exit validation on line 244 ensures `attributes` is non-null and contains `INPUT_SCHEMA`, the subsequent Gemini check on line 258 redundantly re-checks `attributes.containsKey(INPUT_SCHEMA)`. This second check is always true at that point and can be simplified. if (parameters.containsKey("gemini.schema.cleaner") && attributes.containsKey(INPUT_SCHEMA)) { String schema = String.valueOf(attributes.get(INPUT_SCHEMA)); String cleanedSchema = removeAdditionalPropertiesFromSchema(schema); toolParams.put("attributes.input_schema_cleaned", cleanedSchema); }

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>

github-actions · 2026-04-04T16:30:19Z

PR Code Suggestions ✨

Latest suggestions up to 3fa81f5

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate against null input_schema value The validation should also check that the `input_schema` value is not null, since `containsKey` returns true even if the key maps to a null value. A null `input_schema` would cause a `NullPointerException` later when `String.valueOf` is called or when the substitutor processes it. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [244-253] -if (attributes == null \|\| !attributes.containsKey(INPUT_SCHEMA)) { +if (attributes == null \|\| !attributes.containsKey(INPUT_SCHEMA) \|\| attributes.get(INPUT_SCHEMA) == null) { throw new IllegalArgumentException( "Tool [" + toolName + "] is missing [" + INPUT_SCHEMA + "] in its attributes. " + "All tools used with function calling must define an input_schema." ); } Suggestion importance[1-10]: 6 __ Why: The suggestion is valid - `containsKey` returns true even when the value is null, and a null `input_schema` could cause issues downstream when `String.valueOf` is called or during substitution. However, this is a minor edge case since in practice `input_schema` values are set as non-null strings.	Low

Previous suggestions

Suggestions up to commit 3a1a733

Category	Suggestion	Impact
Possible issue	Validate against null input_schema value The validation should also check that the `input_schema` value is not null, as `containsKey` returns true even when the key maps to a null value. A null `input_schema` would cause a `NullPointerException` later when `String.valueOf` is called or when the substitutor processes it. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [244-249] -if (attributes == null \|\| !attributes.containsKey(INPUT_SCHEMA)) { +if (attributes == null \|\| !attributes.containsKey(INPUT_SCHEMA) \|\| attributes.get(INPUT_SCHEMA) == null) { throw new IllegalArgumentException( "Tool [" + toolName + "] is missing [" + INPUT_SCHEMA + "] in its attributes. " + "All tools used with function calling must define an input_schema." ); } Suggestion importance[1-10]: 6 __ Why: The suggestion is valid - `containsKey` returns true even when the value is null, and `String.valueOf(null)` would return the string "null" rather than throwing an NPE, but this could still cause incorrect behavior. Adding a null check for the `INPUT_SCHEMA` value improves robustness and provides a clearer error message.	Low

Suggestions up to commit 7b8e6c5

Category	Suggestion	Impact
General	Consolidate split attribute-handling logic blocks When `attributes` is not null but doesn't contain `INPUT_SCHEMA`, the default schema is set first, but then the loop over `attributes` will not overwrite it (since the key is absent). However, if `attributes` is not null and does contain `INPUT_SCHEMA`, the default is correctly skipped. The logic is correct but the order could cause confusion — more critically, if `attributes` contains `INPUT_SCHEMA`, the default is not added, which is correct. But if `attributes` is null, the default is added and then the `if (attributes != null)` block is skipped, which is also correct. The logic is sound, but consider combining the two `if` blocks to make the intent clearer and avoid potential future bugs from the split logic. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [245-257] -if (attributes == null \|\| !attributes.containsKey(INPUT_SCHEMA)) { - toolParams.put("attributes." + INPUT_SCHEMA, DEFAULT_INPUT_SCHEMA); -} if (attributes != null) { for (String key : attributes.keySet()) { toolParams.put("attributes." + key, attributes.get(key)); } + if (!attributes.containsKey(INPUT_SCHEMA)) { + toolParams.put("attributes." + INPUT_SCHEMA, DEFAULT_INPUT_SCHEMA); + } + // For Gemini, clean input_schema to remove additionalProperties + if (parameters.containsKey("gemini.schema.cleaner") && attributes.containsKey(INPUT_SCHEMA)) { + String schema = String.valueOf(attributes.get(INPUT_SCHEMA)); + String cleanedSchema = removeAdditionalPropertiesFromSchema(schema); + toolParams.put("attributes.input_schema_cleaned", cleanedSchema); + } +} else { + toolParams.put("attributes." + INPUT_SCHEMA, DEFAULT_INPUT_SCHEMA); +} Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that the split `if` blocks can be confusing and consolidates them into a cleaner structure. The improved code is logically equivalent but more readable and less prone to future bugs from the separated logic.	Low
General	Ensure default schema string stays consistent with object form The `DEFAULT_INPUT_SCHEMA` is a raw JSON string that is inserted directly into tool parameters. If this string is later processed by a JSON parser or template engine that expects a parsed object rather than a string, it may cause issues. Consider whether this value should be stored as a pre-parsed structure or validated to ensure it remains consistent with the `emptySchema` used in `wrapFrontendToolsAsToolObjects`. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [157] private static final String DEFAULT_INPUT_SCHEMA = "{\"type\":\"object\",\"properties\":{}}"; +// Note: Must remain consistent with the emptySchema in wrapFrontendToolsAsToolObjects: +// Map.of("type", "object", "properties", Map.of()) Suggestion importance[1-10]: 1 __ Why: The suggestion only asks to add a comment and doesn't change any logic. The `existing_code` and `improved_code` are functionally identical, with only a comment added, which doesn't warrant a meaningful score.	Low

Suggestions up to commit a5cbfb0

Category	Suggestion	Impact
General	Verify consistent value types for template substitution The default `input_schema` value is stored as a raw JSON string, but other attribute values from `attributes.get(key)` are stored as their original object types. If the downstream template substitution expects consistent types (e.g., always a string or always an object), mixing a raw JSON string here with other types could cause inconsistencies or rendering issues. Ensure the type used for the default value matches what the template engine expects for `input_schema`. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [244] +toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}"); - Suggestion importance[1-10]: 3 __ Why: The concern about type consistency between the default `input_schema` string and other attribute values is valid in principle, but the `improved_code` is identical to the `existing_code`, making this only a verification suggestion with no concrete fix provided.	Low
General	Ensure real input_schema overrides default value When `attributes` is non-null but lacks `input_schema`, the default value is added first, but then the loop over `attributes` will not overwrite it since `input_schema` is absent. However, if `attributes` does contain `input_schema`, the default is never set — this is correct. But if `attributes` is non-null and contains `input_schema`, the default should not be set, and the existing logic handles that. The real issue is ordering: the default `input_schema` should be placed so that a real `input_schema` from `attributes` (if present) can overwrite it. Consider inserting the default before iterating attributes, and always iterating attributes when non-null, so a real `input_schema` value takes precedence. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [243-250] +if (attributes == null \|\| !attributes.containsKey("input_schema")) { + toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}"); +} +if (attributes != null) { + for (String key : attributes.keySet()) { + toolParams.put("attributes." + key, attributes.get(key)); + } - Suggestion importance[1-10]: 2 __ Why: The suggestion claims there's an ordering issue, but the existing code already handles this correctly: the default is only set when `attributes` is null or doesn't contain `input_schema`, so a real `input_schema` from `attributes` would never be overwritten. The `improved_code` is identical to the `existing_code`, making this suggestion a no-op.	Low

Suggestions up to commit 07c7e88

Category	Suggestion	Impact
General	Strengthen assertion to verify complete default schema The assertion only checks for a partial string match and does not verify the complete default schema value `{"type":"object","properties":{}}`. A more precise assertion would confirm the full expected schema is present, preventing regressions where the schema is partially correct but malformed. ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/AgentUtilsTest.java [1276] -Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\"")); +Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\",\"properties\":{}}")); Suggestion importance[1-10]: 5 __ Why: The improved assertion checks for the full default schema string `{"type":"object","properties":{}}` rather than just a prefix, making the test more precise and better at catching regressions where the schema is partially correct but incomplete.	Low
General	Replace misleading `putIfAbsent` with `put` Using `putIfAbsent` here is misleading since the map is freshly created and the key `"attributes.input_schema"` will never already exist at this point — `put` would be clearer. More importantly, when `attributes` is non-null but lacks `"input_schema"`, the default is inserted first, but then the loop below uses `put` (not `putIfAbsent`), which would never overwrite it anyway since the key isn't in `attributes`. However, if `attributes` contains `"input_schema"`, the condition correctly skips the default, so the logic is sound. Consider using `put` instead of `putIfAbsent` for clarity. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [243-250] if (attributes == null \|\| !attributes.containsKey("input_schema")) { - toolParams.putIfAbsent("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}"); + toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}"); } if (attributes != null) { for (String key : attributes.keySet()) { toolParams.put("attributes." + key, attributes.get(key)); } Suggestion importance[1-10]: 4 __ Why: The suggestion is technically correct — since `toolParams` is freshly created, `putIfAbsent` and `put` behave identically here, so using `put` is clearer. However, this is a minor readability improvement with no functional impact.	Low

Suggestions up to commit 4a8f5a2

Category	Suggestion	Impact
General	Strengthen assertion to verify complete default schema The assertion only checks for a partial string match, which may pass even if the JSON is malformed or incomplete. The test should also verify that the full default schema `{"type":"object","properties":{}}` is present in the result to ensure the entire default value is correctly substituted. ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/AgentUtilsTest.java [1276] -Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\"")); +Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\",\"properties\":{}}")); Suggestion importance[1-10]: 5 __ Why: The existing assertion only checks for a partial match of the JSON schema. Verifying the complete `{"type":"object","properties":{}}` value ensures the full default schema is correctly substituted, making the test more robust and meaningful.	Low
General	Reorder default schema insertion after attribute loop When `attributes` is not null but doesn't contain `input_schema`, the default value is added first via `putIfAbsent`, but then the loop uses `put` (not `putIfAbsent`), which could overwrite the default if a key named `input_schema` is added concurrently or in a future code path. More critically, if `attributes` contains an `input_schema` key, the condition correctly skips adding the default, but if it does NOT contain `input_schema`, the default is added and then the loop runs without adding `input_schema` — this is correct. However, the logic would be cleaner and safer by adding the default after the loop only if `input_schema` was not found in attributes. ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [243-250] -if (attributes == null \|\| !attributes.containsKey("input_schema")) { - toolParams.putIfAbsent("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}"); -} if (attributes != null) { for (String key : attributes.keySet()) { toolParams.put("attributes." + key, attributes.get(key)); } +} +if (attributes == null \|\| !attributes.containsKey("input_schema")) { + toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}"); +} Suggestion importance[1-10]: 4 __ Why: The current logic is actually correct: `putIfAbsent` is used when `attributes` lacks `input_schema`, and the loop only adds keys present in `attributes` (which won't include `input_schema` in that branch). The reordering suggestion changes `putIfAbsent` to `put`, which is a minor behavioral difference but the overall logic improvement is marginal since the existing code works correctly.	Low

github-actions · 2026-04-04T16:31:20Z

Persistent review updated to latest commit 7a63fd6

This reverts commit 7a63fd6.

github-actions · 2026-04-04T16:55:42Z

Persistent review updated to latest commit 4a8f5a2

codecov · 2026-04-06T20:52:35Z

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.40%. Comparing base (70dc612) to head (3fa81f5).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...nsearch/ml/engine/algorithms/agent/AgentUtils.java	45.45%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #4779   +/-   ##
=========================================
  Coverage     77.40%   77.40%           
- Complexity    11900    11905    +5     
=========================================
  Files           963      963           
  Lines         53310    53311    +1     
  Branches       6500     6500           
=========================================
+ Hits          41266    41268    +2     
+ Misses         9287     9286    -1     
  Partials       2757     2757

Flag	Coverage Δ
ml-commons	`77.40% <45.45%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

owaiskazi19 · 2026-04-06T22:15:25Z

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java

            toolParams.put(DESCRIPTION, StringEscapeUtils.escapeJson(tool.getDescription()));
            Map<String, ?> attributes = tool.getAttributes();
+            if (attributes == null || !attributes.containsKey("input_schema")) {
+                toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");


Can we create constants for both key and value

Done — extracted both into INPUT_SCHEMA and DEFAULT_INPUT_SCHEMA constants, and replaced all 5 occurrences of the "input_schema" string literal in the file.

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>

github-actions · 2026-04-07T17:33:07Z

Persistent review updated to latest commit 7b8e6c5

pyek-bot · 2026-04-08T00:48:49Z

This can lead to silent failures where the agent attempts to trigger the tool using the "default" input schema but might fail as the tool expects a different input schema. This failure would never be surfaced to the user.

I agree that some validation could be added and error messages cleaned up.

With that said, why not just set the input_schema during agent register? You can override the input_schema today, example: https://github.com/Jon-AtAWS/opensearch-ml-quickstart/blob/5bf3bdbd0fcb71842be7a3f91c7e72894fd626bb/examples/agent_tools.py#L49

github-actions · 2026-04-08T20:31:21Z

Persistent review updated to latest commit 3a1a733

Instead of silently injecting an empty default schema when a tool is missing input_schema, throw IllegalArgumentException with a message naming the tool and the missing attribute. A default empty schema misleads the LLM into calling the tool with no arguments, causing silent failures downstream. Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>

github-actions · 2026-04-08T20:35:48Z

Persistent review updated to latest commit 3fa81f5

ohltyler · 2026-04-08T20:38:36Z

This can lead to silent failures where the agent attempts to trigger the tool using the "default" input schema but might fail as the tool expects a different input schema. This failure would never be surfaced to the user.

I agree that some validation could be added and error messages cleaned up.

With that said, why not just set the input_schema during agent register? You can override the input_schema today, example: https://github.com/Jon-AtAWS/opensearch-ml-quickstart/blob/5bf3bdbd0fcb71842be7a3f91c7e72894fd626bb/examples/agent_tools.py#L49

@pyek-bot thought about this more, I agree that the silent failure is also not ideal. I've updated the PR to just throw an exception if the input schema is missing. I separately added default input schemas for the tools that we were using when testing agentic search with the vectordb tool here: opensearch-project/skills#722

Setting input schema during register is always an option, but not a good user experience and very error prone.

pyek-bot · 2026-04-11T00:53:17Z

@ohltyler Thanks for adding the input_schema to these tools. I have this issue created from a while back: opensearch-project/skills#638. Could you close this if your PR addresses it?

Regarding the exception, what happens now if a tool is registered without input_schema is the error returned back to the agent and we expect it to self correct? I don't think this error is surfaced to the user.

ohltyler requested review from HenryL27, Zhangxunmt, austintlee, b4sjoo, dhrubo-os, jngz-es, mingshl, model-collapse, pyek-bot, rbhavna, rithinpullela, sam-herman, xinyual, ylwu-amzn and zane-neo as code owners April 4, 2026 16:28

ohltyler added a commit to ohltyler/ml-commons that referenced this pull request Apr 4, 2026

Add CHANGELOG entry for opensearch-project#4779

7a63fd6

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>

ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:32 — with GitHub Actions Waiting

ohltyler marked this pull request as draft April 4, 2026 16:42

ohltyler mentioned this pull request Apr 4, 2026

Add input_schema to AbstractRetrieverTool and runtime param overrides to VectorDBTool opensearch-project/skills#722

Merged

ohltyler added a commit to ohltyler/ml-commons that referenced this pull request Apr 4, 2026

Revert "Add CHANGELOG entry for opensearch-project#4779"

4a8f5a2

This reverts commit 7a63fd6.

ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:55 — with GitHub Actions Waiting

ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 6, 2026 20:19 — with GitHub Actions Inactive

ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 6, 2026 20:52 — with GitHub Actions Inactive

owaiskazi19 reviewed Apr 6, 2026

View reviewed changes

Extract input_schema key and default value into constants

7b8e6c5

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>

ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 7, 2026 17:33 — with GitHub Actions Inactive

rithinpullela previously approved these changes Apr 8, 2026

View reviewed changes

ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 00:39 — with GitHub Actions Inactive

ohltyler dismissed rithinpullela’s stale review via 3a1a733 April 8, 2026 20:30

ohltyler changed the title ~~Fix function calling payload for tools without input_schema~~ Fail fast when tool is missing input_schema in function calling Apr 8, 2026

ohltyler force-pushed the fix/default-input-schema-for-tools branch from 3a1a733 to 3fa81f5 Compare April 8, 2026 20:34

ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 20:36 — with GitHub Actions Inactive

ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 9, 2026 17:36 — with GitHub Actions Inactive

Conversation

ohltyler commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fix

Testing

Issues Resolved

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 3fa81f5)

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

owaiskazi19 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ohltyler Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

pyek-bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

ohltyler commented Apr 8, 2026

Uh oh!

pyek-bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ohltyler commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

(Review updated until commit `3fa81f5`)

github-actions bot commented Apr 4, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading