Skip to content

Fail fast when tool is missing input_schema in function calling#4779

Open
ohltyler wants to merge 4 commits intoopensearch-project:mainfrom
ohltyler:fix/default-input-schema-for-tools
Open

Fail fast when tool is missing input_schema in function calling#4779
ohltyler wants to merge 4 commits intoopensearch-project:mainfrom
ohltyler:fix/default-input-schema-for-tools

Conversation

@ohltyler
Copy link
Copy Markdown
Member

@ohltyler ohltyler commented Apr 4, 2026

Description

When a conversational agent builds the function calling payload (e.g., for Bedrock Converse or OpenAI), each tool must have an input_schema attribute so the LLM knows what parameters to pass. If input_schema is missing, the ${tool.attributes.input_schema} placeholder is left unresolved, producing invalid JSON that the LLM provider rejects with an opaque Invalid payload error.

Fix

Throw a clear IllegalArgumentException in AgentUtils.addToolsToFunctionCalling() when a tool is missing input_schema, naming the tool and the missing attribute. This fails fast with an actionable error message instead of producing silent failures.

Tools should define input_schema either:

  1. In the tool implementation (as built-in ml-commons tools and skills plugin tools already do)
  2. At agent registration time via the tool's attributes override

Testing

  • Unit tests for both cases: null attributes and non-null attributes missing input_schema — both verify the exception is thrown with the correct tool name and attribute name
  • Existing testAddToolsToFunctionCalling updated to include input_schema in mock tool attributes

Issues Resolved

N/A - discovered during investigation of VectorDBTool support for agentic search

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 3fa81f5)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Breaking Change

The new validation throws an IllegalArgumentException when input_schema is missing from tool attributes. This is a breaking change for any existing tools (e.g., built-in ml-commons tools or third-party tools) that do not currently define input_schema in their attributes. It should be verified that ALL tools used in function calling contexts already have input_schema defined, otherwise this will break existing deployments silently at runtime.

if (attributes == null || !attributes.containsKey(INPUT_SCHEMA)) {
    throw new IllegalArgumentException(
        "Tool ["
            + toolName
            + "] is missing ["
            + INPUT_SCHEMA
            + "] in its attributes. "
            + "All tools used with function calling must define an input_schema."
    );
}
Redundant Check

After the early-exit validation on line 244 ensures attributes is non-null and contains INPUT_SCHEMA, the subsequent Gemini check on line 258 redundantly re-checks attributes.containsKey(INPUT_SCHEMA). This second check is always true at that point and can be simplified.

if (parameters.containsKey("gemini.schema.cleaner") && attributes.containsKey(INPUT_SCHEMA)) {
    String schema = String.valueOf(attributes.get(INPUT_SCHEMA));
    String cleanedSchema = removeAdditionalPropertiesFromSchema(schema);
    toolParams.put("attributes.input_schema_cleaned", cleanedSchema);
}

ohltyler added a commit to ohltyler/ml-commons that referenced this pull request Apr 4, 2026
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

PR Code Suggestions ✨

Latest suggestions up to 3fa81f5

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate against null input_schema value

The validation should also check that the input_schema value is not null, since
containsKey returns true even if the key maps to a null value. A null input_schema
would cause a NullPointerException later when String.valueOf is called or when the
substitutor processes it.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [244-253]

-if (attributes == null || !attributes.containsKey(INPUT_SCHEMA)) {
+if (attributes == null || !attributes.containsKey(INPUT_SCHEMA) || attributes.get(INPUT_SCHEMA) == null) {
     throw new IllegalArgumentException(
         "Tool ["
             + toolName
             + "] is missing ["
             + INPUT_SCHEMA
             + "] in its attributes. "
             + "All tools used with function calling must define an input_schema."
     );
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion is valid - containsKey returns true even when the value is null, and a null input_schema could cause issues downstream when String.valueOf is called or during substitution. However, this is a minor edge case since in practice input_schema values are set as non-null strings.

Low

Previous suggestions

Suggestions up to commit 3a1a733
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate against null input_schema value

The validation should also check that the input_schema value is not null, as
containsKey returns true even when the key maps to a null value. A null input_schema
would cause a NullPointerException later when String.valueOf is called or when the
substitutor processes it.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [244-249]

-if (attributes == null || !attributes.containsKey(INPUT_SCHEMA)) {
+if (attributes == null || !attributes.containsKey(INPUT_SCHEMA) || attributes.get(INPUT_SCHEMA) == null) {
     throw new IllegalArgumentException(
         "Tool [" + toolName + "] is missing [" + INPUT_SCHEMA + "] in its attributes. "
             + "All tools used with function calling must define an input_schema."
     );
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion is valid - containsKey returns true even when the value is null, and String.valueOf(null) would return the string "null" rather than throwing an NPE, but this could still cause incorrect behavior. Adding a null check for the INPUT_SCHEMA value improves robustness and provides a clearer error message.

Low
Suggestions up to commit 7b8e6c5
CategorySuggestion                                                                                                                                    Impact
General
Consolidate split attribute-handling logic blocks

When attributes is not null but doesn't contain INPUT_SCHEMA, the default schema is
set first, but then the loop over attributes will not overwrite it (since the key is
absent). However, if attributes is not null and does contain INPUT_SCHEMA, the
default is correctly skipped. The logic is correct but the order could cause
confusion — more critically, if attributes contains INPUT_SCHEMA, the default is not
added, which is correct. But if attributes is null, the default is added and then
the if (attributes != null) block is skipped, which is also correct. The logic is
sound, but consider combining the two if blocks to make the intent clearer and avoid
potential future bugs from the split logic.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [245-257]

-if (attributes == null || !attributes.containsKey(INPUT_SCHEMA)) {
-    toolParams.put("attributes." + INPUT_SCHEMA, DEFAULT_INPUT_SCHEMA);
-}
 if (attributes != null) {
     for (String key : attributes.keySet()) {
         toolParams.put("attributes." + key, attributes.get(key));
     }
+    if (!attributes.containsKey(INPUT_SCHEMA)) {
+        toolParams.put("attributes." + INPUT_SCHEMA, DEFAULT_INPUT_SCHEMA);
+    }
+    // For Gemini, clean input_schema to remove additionalProperties
+    if (parameters.containsKey("gemini.schema.cleaner") && attributes.containsKey(INPUT_SCHEMA)) {
+        String schema = String.valueOf(attributes.get(INPUT_SCHEMA));
+        String cleanedSchema = removeAdditionalPropertiesFromSchema(schema);
+        toolParams.put("attributes.input_schema_cleaned", cleanedSchema);
+    }
+} else {
+    toolParams.put("attributes." + INPUT_SCHEMA, DEFAULT_INPUT_SCHEMA);
+}
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the split if blocks can be confusing and consolidates them into a cleaner structure. The improved code is logically equivalent but more readable and less prone to future bugs from the separated logic.

Low
Ensure default schema string stays consistent with object form

The DEFAULT_INPUT_SCHEMA is a raw JSON string that is inserted directly into tool
parameters. If this string is later processed by a JSON parser or template engine
that expects a parsed object rather than a string, it may cause issues. Consider
whether this value should be stored as a pre-parsed structure or validated to ensure
it remains consistent with the emptySchema used in wrapFrontendToolsAsToolObjects.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [157]

 private static final String DEFAULT_INPUT_SCHEMA = "{\"type\":\"object\",\"properties\":{}}";
+// Note: Must remain consistent with the emptySchema in wrapFrontendToolsAsToolObjects:
+// Map.of("type", "object", "properties", Map.of())
Suggestion importance[1-10]: 1

__

Why: The suggestion only asks to add a comment and doesn't change any logic. The existing_code and improved_code are functionally identical, with only a comment added, which doesn't warrant a meaningful score.

Low
Suggestions up to commit a5cbfb0
CategorySuggestion                                                                                                                                    Impact
General
Verify consistent value types for template substitution

The default input_schema value is stored as a raw JSON string, but other attribute
values from attributes.get(key) are stored as their original object types. If the
downstream template substitution expects consistent types (e.g., always a string or
always an object), mixing a raw JSON string here with other types could cause
inconsistencies or rendering issues. Ensure the type used for the default value
matches what the template engine expects for input_schema.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [244]

+toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
 
-
Suggestion importance[1-10]: 3

__

Why: The concern about type consistency between the default input_schema string and other attribute values is valid in principle, but the improved_code is identical to the existing_code, making this only a verification suggestion with no concrete fix provided.

Low
Ensure real input_schema overrides default value

When attributes is non-null but lacks input_schema, the default value is added
first, but then the loop over attributes will not overwrite it since input_schema is
absent. However, if attributes does contain input_schema, the default is never set —
this is correct. But if attributes is non-null and contains input_schema, the
default should not be set, and the existing logic handles that. The real issue is
ordering: the default input_schema should be placed so that a real input_schema from
attributes (if present) can overwrite it. Consider inserting the default before
iterating attributes, and always iterating attributes when non-null, so a real
input_schema value takes precedence.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [243-250]

+if (attributes == null || !attributes.containsKey("input_schema")) {
+    toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
+}
+if (attributes != null) {
+    for (String key : attributes.keySet()) {
+        toolParams.put("attributes." + key, attributes.get(key));
+    }
 
-
Suggestion importance[1-10]: 2

__

Why: The suggestion claims there's an ordering issue, but the existing code already handles this correctly: the default is only set when attributes is null or doesn't contain input_schema, so a real input_schema from attributes would never be overwritten. The improved_code is identical to the existing_code, making this suggestion a no-op.

Low
Suggestions up to commit 07c7e88
CategorySuggestion                                                                                                                                    Impact
General
Strengthen assertion to verify complete default schema

The assertion only checks for a partial string match and does not verify the
complete default schema value {"type":"object","properties":{}}. A more precise
assertion would confirm the full expected schema is present, preventing regressions
where the schema is partially correct but malformed.

ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/AgentUtilsTest.java [1276]

-Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\""));
+Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\",\"properties\":{}}"));
Suggestion importance[1-10]: 5

__

Why: The improved assertion checks for the full default schema string {"type":"object","properties":{}} rather than just a prefix, making the test more precise and better at catching regressions where the schema is partially correct but incomplete.

Low
Replace misleading putIfAbsent with put

Using putIfAbsent here is misleading since the map is freshly created and the key
"attributes.input_schema" will never already exist at this point — put would be
clearer. More importantly, when attributes is non-null but lacks "input_schema", the
default is inserted first, but then the loop below uses put (not putIfAbsent), which
would never overwrite it anyway since the key isn't in attributes. However, if
attributes contains "input_schema", the condition correctly skips the default, so
the logic is sound. Consider using put instead of putIfAbsent for clarity.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [243-250]

 if (attributes == null || !attributes.containsKey("input_schema")) {
-    toolParams.putIfAbsent("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
+    toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
 }
 if (attributes != null) {
     for (String key : attributes.keySet()) {
         toolParams.put("attributes." + key, attributes.get(key));
     }
Suggestion importance[1-10]: 4

__

Why: The suggestion is technically correct — since toolParams is freshly created, putIfAbsent and put behave identically here, so using put is clearer. However, this is a minor readability improvement with no functional impact.

Low
Suggestions up to commit 4a8f5a2
CategorySuggestion                                                                                                                                    Impact
General
Strengthen assertion to verify complete default schema

The assertion only checks for a partial string match, which may pass even if the
JSON is malformed or incomplete. The test should also verify that the full default
schema {"type":"object","properties":{}} is present in the result to ensure the
entire default value is correctly substituted.

ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/AgentUtilsTest.java [1276]

-Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\""));
+Assert.assertTrue(result.contains("\"inputSchema\":{\"json\":{\"type\":\"object\",\"properties\":{}}"));
Suggestion importance[1-10]: 5

__

Why: The existing assertion only checks for a partial match of the JSON schema. Verifying the complete {"type":"object","properties":{}} value ensures the full default schema is correctly substituted, making the test more robust and meaningful.

Low
Reorder default schema insertion after attribute loop

When attributes is not null but doesn't contain input_schema, the default value is
added first via putIfAbsent, but then the loop uses put (not putIfAbsent), which
could overwrite the default if a key named input_schema is added concurrently or in
a future code path. More critically, if attributes contains an input_schema key, the
condition correctly skips adding the default, but if it does NOT contain
input_schema, the default is added and then the loop runs without adding
input_schema — this is correct. However, the logic would be cleaner and safer by
adding the default after the loop only if input_schema was not found in attributes.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AgentUtils.java [243-250]

-if (attributes == null || !attributes.containsKey("input_schema")) {
-    toolParams.putIfAbsent("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
-}
 if (attributes != null) {
     for (String key : attributes.keySet()) {
         toolParams.put("attributes." + key, attributes.get(key));
     }
+}
+if (attributes == null || !attributes.containsKey("input_schema")) {
+    toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
+}
Suggestion importance[1-10]: 4

__

Why: The current logic is actually correct: putIfAbsent is used when attributes lacks input_schema, and the loop only adds keys present in attributes (which won't include input_schema in that branch). The reordering suggestion changes putIfAbsent to put, which is a minor behavioral difference but the overall logic improvement is marginal since the existing code works correctly.

Low

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

Persistent review updated to latest commit 7a63fd6

@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:32 — with GitHub Actions Waiting
@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:32 — with GitHub Actions Waiting
@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:32 — with GitHub Actions Waiting
@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:32 — with GitHub Actions Waiting
@ohltyler ohltyler marked this pull request as draft April 4, 2026 16:42
ohltyler added a commit to ohltyler/ml-commons that referenced this pull request Apr 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

Persistent review updated to latest commit 4a8f5a2

@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:55 — with GitHub Actions Waiting
@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:55 — with GitHub Actions Waiting
@ohltyler ohltyler requested a deployment to ml-commons-cicd-env-require-approval April 4, 2026 16:55 — with GitHub Actions Waiting
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 6, 2026 20:19 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 6, 2026 20:19 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 6, 2026 20:52 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 6, 2026 20:52 — with GitHub Actions Inactive
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.40%. Comparing base (70dc612) to head (3fa81f5).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...nsearch/ml/engine/algorithms/agent/AgentUtils.java 45.45% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #4779   +/-   ##
=========================================
  Coverage     77.40%   77.40%           
- Complexity    11900    11905    +5     
=========================================
  Files           963      963           
  Lines         53310    53311    +1     
  Branches       6500     6500           
=========================================
+ Hits          41266    41268    +2     
+ Misses         9287     9286    -1     
  Partials       2757     2757           
Flag Coverage Δ
ml-commons 77.40% <45.45%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

toolParams.put(DESCRIPTION, StringEscapeUtils.escapeJson(tool.getDescription()));
Map<String, ?> attributes = tool.getAttributes();
if (attributes == null || !attributes.containsKey("input_schema")) {
toolParams.put("attributes.input_schema", "{\"type\":\"object\",\"properties\":{}}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create constants for both key and value

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — extracted both into INPUT_SCHEMA and DEFAULT_INPUT_SCHEMA constants, and replaced all 5 occurrences of the "input_schema" string literal in the file.

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Persistent review updated to latest commit 7b8e6c5

@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 7, 2026 17:33 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 7, 2026 17:33 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 7, 2026 17:33 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 7, 2026 17:33 — with GitHub Actions Inactive
rithinpullela
rithinpullela previously approved these changes Apr 8, 2026
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 00:39 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 00:39 — with GitHub Actions Inactive
@pyek-bot
Copy link
Copy Markdown
Collaborator

pyek-bot commented Apr 8, 2026

This can lead to silent failures where the agent attempts to trigger the tool using the "default" input schema but might fail as the tool expects a different input schema. This failure would never be surfaced to the user.

I agree that some validation could be added and error messages cleaned up.

With that said, why not just set the input_schema during agent register? You can override the input_schema today, example: https://github.com/Jon-AtAWS/opensearch-ml-quickstart/blob/5bf3bdbd0fcb71842be7a3f91c7e72894fd626bb/examples/agent_tools.py#L49

@ohltyler ohltyler changed the title Fix function calling payload for tools without input_schema Fail fast when tool is missing input_schema in function calling Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Persistent review updated to latest commit 3a1a733

Instead of silently injecting an empty default schema when a tool is
missing input_schema, throw IllegalArgumentException with a message
naming the tool and the missing attribute. A default empty schema
misleads the LLM into calling the tool with no arguments, causing
silent failures downstream.

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
@ohltyler ohltyler force-pushed the fix/default-input-schema-for-tools branch from 3a1a733 to 3fa81f5 Compare April 8, 2026 20:34
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Persistent review updated to latest commit 3fa81f5

@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 20:36 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 20:36 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 20:36 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 8, 2026 20:36 — with GitHub Actions Inactive
@ohltyler
Copy link
Copy Markdown
Member Author

ohltyler commented Apr 8, 2026

This can lead to silent failures where the agent attempts to trigger the tool using the "default" input schema but might fail as the tool expects a different input schema. This failure would never be surfaced to the user.

I agree that some validation could be added and error messages cleaned up.

With that said, why not just set the input_schema during agent register? You can override the input_schema today, example: https://github.com/Jon-AtAWS/opensearch-ml-quickstart/blob/5bf3bdbd0fcb71842be7a3f91c7e72894fd626bb/examples/agent_tools.py#L49

@pyek-bot thought about this more, I agree that the silent failure is also not ideal. I've updated the PR to just throw an exception if the input schema is missing. I separately added default input schemas for the tools that we were using when testing agentic search with the vectordb tool here: opensearch-project/skills#722

Setting input schema during register is always an option, but not a good user experience and very error prone.

@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 9, 2026 17:36 — with GitHub Actions Inactive
@ohltyler ohltyler temporarily deployed to ml-commons-cicd-env-require-approval April 9, 2026 17:36 — with GitHub Actions Inactive
@pyek-bot
Copy link
Copy Markdown
Collaborator

@ohltyler Thanks for adding the input_schema to these tools. I have this issue created from a while back: opensearch-project/skills#638. Could you close this if your PR addresses it?

Regarding the exception, what happens now if a tool is registered without input_schema is the error returned back to the agent and we expect it to self correct? I don't think this error is surfaced to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants