[FEATURE] Improve EncryptorImpl with Asynchronous Handling for Scalability#3919
Conversation
…ility Removed the usage of ContDownLatch. Every requets will be submitted and returns the Future. Added a list to track the ongoing master key generation. If any tenant id is in the list, then it's key generation is on going and it will wait until other thread completes the key genearion. Same time system will accept other requests, if key is already avaialble in the map that will procced otherwise key generation for new tenant will start in different thread. So, multiple tenants key generation can happen simulatneuosly. Resolves opensearch-project#3510 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
|
Awesome! Thanks for raising the PR. This will be a great improvement. I'll start actively reviewing this PR from tomorrow. Can you also please update your PR in details like how did you test for single tenancy and also for multi tenancy? |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3919 +/- ##
============================================
- Coverage 77.32% 77.20% -0.12%
- Complexity 11534 11537 +3
============================================
Files 947 947
Lines 51772 51868 +96
Branches 6274 6275 +1
============================================
+ Hits 40031 40044 +13
- Misses 9091 9172 +81
- Partials 2650 2652 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java
Outdated
Show resolved
Hide resolved
ml-algorithms/src/main/java/org/opensearch/ml/engine/encryptor/EncryptorImpl.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/opensearch/ml/common/connector/McpConnector.java
Outdated
Show resolved
Hide resolved
|
See IT failed in CI |
|
@akolarkunnu I see the code coverage is low in this PR: https://app.codecov.io/gh/opensearch-project/ml-commons/pull/3919?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=opensearch-project, can you check and increase the code coverage? |
|
Persistent review updated to latest commit 044ea8f |
@ylwu-amzn I saw it, Is it a random failure or consistent? I am not able to reproduce it because this test case has dependency with AWS ACEES KEY. Is it possible to give me an environment where I can reproduce the issue and debug it? |
|
@zane-neo Looks like this is a false report. Eg: I can see GetTaskTransportAction.java has 0% coverage, but there is a test class GetTaskTransportActionTests.java . It applies to all other files which shows 0% coverage. And in the latest CI run I can see coverage task is passed. |
|
@akolarkunnu I reran the failed checks, and the DCO is still failing, can you rebase your code to fix this? Thanks. |
@zane-neo I tried to do "git rebase HEAD~42 --signoff", but it ended up with too many conflicts. @dhrubo-os was saying there is a way to correct DCO from maintainers side. Can you please check that. It is hard to do from my end, since it is showing a cycle of conflicts. |
|
@zane-neo I solved the DCO issue in the PR. |
|
Persistent review updated to latest commit e2f3d9c |
|
Merged the latest code. @dhrubo-os @zane-neo Can you please help me to figure out is there any real issue to fix related to RestBedRockInferenceIT > test_bedrock_multimodal_model failure |
@zane-neo , can you help find root cause of this failure? |
|
Persistent review updated to latest commit 5859d81 |
ml-algorithms/src/main/java/org/opensearch/ml/engine/encryptor/EncryptorImpl.java
Outdated
Show resolved
Hide resolved
I checked this and it's an issue introduced in this PR, I left comments in corresponding lines: https://github.com/opensearch-project/ml-commons/pull/3919/changes#r2916783932 |
@dhrubo-os I see the DCO is still failing. @akolarkunnu, understand it's hard to rebase too many commits, maintainers can set the DCO to pass, it's fine. My suggestion when developing new features is always use rebase instead of merging upstream code, rebase has much more benefit than merging, and always add |
Resolves opensearch-project#3510 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
|
Persistent review updated to latest commit 217d870 |
Thanks for the details, Corrected it by reverting master key setting based on tenant id is null or not. |
@zane-neo yeah every time @akolarkunnu is taking any updates from main or pushing any changes DCO started failing, so my plan is before merging the PR, I will set the DCO to pass from maintainer side, so we don't need to worry on DCO for now (specially for this PR). But agree with your instructions to @akolarkunnu !! |
Resolves opensearch-project#3510 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>
|
Persistent review updated to latest commit 946dd49 |
|
Woohoo, this PR is merged! Thanks all for the patience in pushing this through! I hope to see fewer flaky tests on my downstream plugin :) |
Description
Removed CountDownLatch completely and now it is purely based on action listeners.
Along with this, fixed the issue of duplicate master key generation. If key generation for a tenant is in progress and another request come for encryption/decryption with same tenant id, it will again try to generate another master key , because old request is in the process of creating key and not yet completed. So there is a chance of creating duplicate keys for single tenant.
Fix : Removed CountDownLatch completely and now it is purely based on action listeners.
Storing all tenants who are waiting for the key generation in the map tenantWaitingListenerMap. Whenever the key generation completes or error happened for a tenant, notify all requestors waiting for that tenant key generation.
Now both encrypt and decrypt APIs accepts list of Strings as parameter to encrypt/decrypt. Previously it was single String object.
Testing:
Added more test cases with multi threaded use cases.
Also manually tested single and multi tenant use cases :
Commands used for testing(eg: Multi tenant):
Register the model:
curl -XPOST "http://localhost:9200/_plugins/_ml/models/_register"
-H 'Content-Type: application/json'
-H 'x-tenant-id: 1234567'
-d '{
"name": "My OpenAI model: gpt-5",
"function_name": "remote",
"description": "test model",
"connector": {
"name": "My openai connector: gpt-5",
"description": "The connector to openai chat model",
"version": 1,
"protocol": "http",
"parameters": {
"model": "gpt-5"
},
"credential": {
"openAI_key": "..."
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "url",
"headers": {
"Authorization": "Bearer ${credential.openAI_key}"
},
"request_body": "{ "model": "${parameters.model}", "messages": [{"role":"developer","content":"${parameters.system_prompt}"},${parameters._chat_history:-}{"role":"user","content":"${parameters.user_prompt}"}${parameters._interactions:-}], "user": "abdulmun", "reasoning_effort":"minimal"${parameters.tool_configs:-}}"
}
]
}
}'
Invoking PREDICT API:
curl -XPOST "http://localhost:9200/_plugins/_ml/models/3GTtrJwBgpGVOGjb9j0-/_predict"
-H 'Content-Type: application/json'
-H 'x-tenant-id: 1234567'
-d '{
"parameters": {
"user_prompt": "What is 2+2?",
"system_prompt": "You are a helpful assistant."
}
}'
Agents with MCP Connector:
Creating MCP Connector:
curl -XPOST "http://localhost:9200/_plugins/_ml/connectors/_create" -H 'Content-Type: application/json' -d'
{
"name": "My MCP Connector",
"description": "The connector to MCP server",
"version": 1,
"protocol": "mcp_streamable_http",
"credential":{
},
"parameters": {
"endpoint": "/mcp"
},
"url": "https://mcp.api.coingecko.com",
"headers": {
"Authorization": "Bearer ${credential.access}",
"Content-Type": "application/json"
}
}
'
Creating Agents with MCP connector and model created above
curl -XPOST "http://localhost:9200/_plugins/_ml/agents/_register" -H 'Content-Type: application/json' -d'
{
"name": "No cache Claude",
"type": "conversational",
"description": "Use this for Agentic Search",
"llm": {
"model_id": "0dBYr5wBiCznJrIKg6DB",
"parameters": {
"max_iteration": 15
}
},
"memory": {
"type": "conversation_index"
},
"parameters": {
"_llm_interface": "openai/v1/chat/completions",
"mcp_connectors":[
{
"mcp_connector_id": "z9BXr5wBiCznJrIKw6BE"
}
]
},
"tools": [
{
"type": "QueryPlanningTool"
},
{
"type": "ListIndexTool"
},
{
"type": "IndexMappingTool"
}
],
"app_type": "os_chat"
}
'
Also tested update model:
curl -X PUT "http://localhost:9200/_plugins/_ml/models/rxnetpwBy1_WPefbq2da"
-H "Content-Type: application/json" -d '{
"connector": {
"name": "My openai connector: gpt-5",
"description": "The connector to openai chat model",
"version": 1,
"protocol": "http",
"parameters": {
"model": "gpt-5"
},
"credential": {
"openAI_key": "openAI_key"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "url",
"headers": {
"Authorization": "Bearer ${credential.openAI_key}"
},
"request_body": "{ "model": "${parameters.model}", "messages": [{"role":"developer","content":"${parameters.system_prompt}"},${parameters._chat_history:-}{"role":"user","content":"${parameters.user_prompt}"}${parameters._interactions:-}], "user": "abdulmun", "reasoning_effort":"minimal"${parameters.tool_configs:-}}"
}
]
}
}'
Related Issues
Resolves #3510
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Refactor
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.