Skip to content

add Yandex Cloud embeddings connector blueprint#4469

Open
mkhludnev wants to merge 6 commits intoopensearch-project:mainfrom
mkhludnev:yc-conn-blueprint
Open

add Yandex Cloud embeddings connector blueprint#4469
mkhludnev wants to merge 6 commits intoopensearch-project:mainfrom
mkhludnev:yc-conn-blueprint

Conversation

@mkhludnev
Copy link
Copy Markdown

@mkhludnev mkhludnev commented Nov 26, 2025

Description

This contributes connector blueprint for Yandex Cloud.

Check List

  • [v] New functionality has been documented.
  • [v] Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Note for reviews

I've contributed to OpenSearch before.
And thanks for reviewing it!

Summary by CodeRabbit

  • Documentation
    • Fixed formatting for the Vertex AI embedding entry to improve readability in the docs.
    • Added a Yandex Cloud section with a step-by-step guide for integrating AI Studio embeddings, covering connector configuration, model registration and deployment, and sample inference requests/responses for easier setup and testing.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 26, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Fixed formatting in the standard blueprints README and added a new Yandex Cloud AI Studio embeddings legacy blueprint documenting connector configuration, model registration, deployment, and example inference payloads and responses.

Changes

Cohort / File(s) Summary
Documentation: Standard Blueprints README
docs/remote_inference_blueprints/standard_blueprints/README.md
Removed an extra leading dash from the VertexAI embedding list item to correct formatting; added a Yandex Cloud section pointing to the legacy blueprint.
Blueprint Guide: Yandex Cloud Embedding (new)
docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md
Added a new legacy blueprint detailing cluster config, connector creation payload (modelUri, folder_id, credentials, embedding action), model group/model registration and deployment payloads, sample inference call, and example responses.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I nibbled at docs and smoothed a stray dash,
A Yandex blueprint now in a neat little stash,
Connectors and payloads all lined in a row,
Models registered, ready to show,
Little rabbit hops — documentation glow.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is incomplete; it lacks the 'Related Issues' section and several checklist items (testing, API changes, documentation issue) are unchecked or missing. Complete the description by adding a 'Related Issues' section and clarifying why certain checklist items are not applicable or were omitted.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a Yandex Cloud embeddings connector blueprint.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Failure
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87e742e and 8a05765.

📒 Files selected for processing (1)
  • docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md

[grammar] ~120-~120: Ensure spelling is correct
Context: ...of life?" } } Sample response of Yadex Cloud AI Studio Embedding: json { ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: spotless
🔇 Additional comments (1)
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1)

45-46: Verify pre/post-processing functions are correct for Yandex Cloud.

The connector references bedrock pre/post-processing functions, but this blueprint is for Yandex Cloud. Verify that these processing functions are:

  1. Generic/universal and work correctly with Yandex Cloud API responses, or
  2. Should be replaced with Yandex-specific processing functions.

If these are not the correct functions for Yandex Cloud, update them accordingly.

@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Failure
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a05765 and 5d8dc2c.

📒 Files selected for processing (1)
  • docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1 hunks)
🔇 Additional comments (1)
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1)

45-46: Clarify why Bedrock pre/post-processing functions are used for Yandex Cloud.

The pre/post-processing functions reference bedrock for a Yandex Cloud connector. Clarify whether Yandex's request/response format is compatible with Bedrock's processing, or if Yandex-specific processing functions should be used instead.

If compatibility is intentional, add a brief comment explaining why Bedrock functions are appropriate here. If these should be Yandex-specific, update them accordingly.

mingshl
mingshl previously approved these changes Jan 6, 2026
@mkhludnev mkhludnev temporarily deployed to ml-commons-cicd-env-require-approval January 6, 2026 23:23 — with GitHub Actions Inactive
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2026 00:31 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2026 00:31 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2026 08:08 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2026 08:08 — with GitHub Actions Failure
@mkhludnev mkhludnev had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2026 08:08 — with GitHub Actions Failure
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/remote_inference_blueprints/standard_blueprints/README.md:
- Around line 50-54: The "Yandex Cloud:" provider entry in the README uses a
trailing colon which is inconsistent with other provider headings (Bedrock,
Cohere, OpenAI, VertexAI); edit the README to remove the colon so the line reads
"Yandex Cloud" and keep the existing link text and indentation unchanged to
match the established pattern.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db55f4b and 3824d45.

📒 Files selected for processing (2)
  • docs/remote_inference_blueprints/standard_blueprints/README.md
  • docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md
✅ Files skipped from review due to trivial changes (1)
  • docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md
🔇 Additional comments (1)
docs/remote_inference_blueprints/standard_blueprints/README.md (1)

50-51: VertexAI formatting corrected.

The formatting fix on the VertexAI entry (removing the extra leading dash) aligns it properly with other provider entries in the legacy blueprints section. The change looks good.

@mkhludnev
Copy link
Copy Markdown
Author

@mingshl may I ask your attention?

mingshl
mingshl previously approved these changes Jan 28, 2026
Copy link
Copy Markdown
Collaborator

@mingshl mingshl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @mkhludnev .

@b4sjoo can you also take a look?

@mkhludnev
Copy link
Copy Markdown
Author

@b4sjoo please elaborate.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

PR Reviewer Guide 🔍

(Review updated until commit a79f3bb)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
The documentation includes example values for connector_id, model_group_id, model_id, and task_id (e.g., CTEou5oBdUNOOrVArUAU, 4THNtZoBdUNOOrVAzj_V). While these appear to be example/placeholder values in a documentation context, users should be clearly warned not to use or expose real IDs. The api_key placeholder <API-KEY> is appropriately marked, but the note about bearer tokens being valid for ~12 hours could encourage insecure practices if users hardcode short-lived tokens.

✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Misleading Pre/Post Processing

The connector uses connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding functions for a Yandex Cloud connector. Using Bedrock-specific processing functions for a non-Bedrock service may cause incorrect request/response handling. The request/response format of Yandex Cloud's textEmbedding API should be verified to ensure compatibility with these Bedrock processing functions.

"pre_process_function": "connector.pre_process.bedrock.embedding",
"post_process_function": "connector.post_process.bedrock.embedding"
Missing Deploy Step

The section header says "Register model & deploy model" but there is no deploy step shown (e.g., POST /_plugins/_ml/models/<model_id>/_deploy). Users following this guide will not know how to deploy the registered model before running inference.

## 3. Register model & deploy model:

```json
POST /_plugins/_ml/model_groups/_register
{
  "name": "yc_remote_model_group",
  "description": "A model group for external YC AI Studio models"
}

Sample response:

{
    "model_group_id": "4THNtZoBdUNOOrVAzj_V"
}
POST /_plugins/_ml/models/_register
{
    "name": "yc-embedding",
    "function_name": "remote",
    "model_group_id": "4THNtZoBdUNOOrVAzj_V",
    "description": "YC embedding model",
    "connector_id": "CTEou5oBdUNOOrVArUAU"
}

Sample response:

{
  "task_id": "5THZtZoBdUNOOrVAEj_I",
  "status": "CREATED",
  "model_id": "CzEou5oBdUNOOrVA10Db"
}

Repeat this step with connector_id obtained in the step 2q to get dedicated model_id for query embedding.


</details>

<details><summary><a href='https://github.com/opensearch-project/ml-commons/pull/4469/files#diff-7b7db5304fc1e2e27d5fc0a4f894c2963943193b3467e1c1ac8e77040923490bR65-R67'><strong>Incomplete Query Connector</strong></a>

Section 2q only provides a text description instructing users to duplicate and modify the connector, without providing an actual code snippet. This is inconsistent with the rest of the documentation and may lead to user errors when creating the query embedding connector.
</summary>

```markdown
### 2q. Create connector for query embedding

Due to Yandex Cloud using distinct [models](https://yandex.cloud/en/docs/ai-studio/concepts/embeddings) for query processing and document processing, separate connectors are required for each purpose. To create the query processing connector, duplicate the connector definition above and replace `text-search-doc` with `text-search-query`.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

PR Code Suggestions ✨

Latest suggestions up to a79f3bb

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Mismatched pre/post processing functions for provider

The connector is for Yandex Cloud embeddings, but the pre/post processing functions
reference bedrock.embedding, which is specific to AWS Bedrock. Using
Bedrock-specific processing functions with a Yandex Cloud API may cause incorrect
request formatting or response parsing. Verify that these Bedrock functions are
intentionally compatible with Yandex Cloud's API format, or use the appropriate
Yandex Cloud-specific processing functions if available.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [46-47]

-"pre_process_function": "connector.pre_process.bedrock.embedding",
-"post_process_function": "connector.post_process.bedrock.embedding"
+"pre_process_function": "connector.pre_process.default.embedding",
+"post_process_function": "connector.post_process.default.embedding"
Suggestion importance[1-10]: 6

__

Why: The blueprint uses connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding for a Yandex Cloud connector, which could cause issues if the Bedrock functions don't match Yandex Cloud's API format. However, the document itself notes "Using these particular pre/post processing functions is crucial," suggesting this may be intentional, and the improved_code uses default.embedding which may not be correct either.

Low
General
Parameter name mismatch in request body

The inputText parameter name used in request_body must match exactly what the
pre-processing function outputs. If connector.pre_process.bedrock.embedding
transforms the input into a different parameter name (e.g., input), the request body
will fail to substitute the value correctly. Ensure the parameter name in
request_body aligns with the output of the chosen pre-processing function.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [45]

-"request_body": "{ \"text\": \"${parameters.inputText}\", \"modelUri\": \"${parameters.modelUri}\" }",
+"request_body": "{ \"text\": \"${parameters.input}\", \"modelUri\": \"${parameters.modelUri}\" }",
Suggestion importance[1-10]: 5

__

Why: If connector.pre_process.bedrock.embedding outputs a parameter named input rather than inputText, the request_body substitution would fail. This is a valid concern about parameter name alignment between the pre-processing function output and the request body template.

Low

Previous suggestions

Suggestions up to commit c45bb98
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix mismatched pre/post processing functions

The connector is using Bedrock-specific pre/post processing functions for a Yandex
Cloud embedding connector. Yandex Cloud's textEmbedding API returns a response with
a different structure than AWS Bedrock, so using
connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding
will likely fail or produce incorrect results. You should use the appropriate
pre/post processing functions that match the Yandex Cloud API response format, or
implement a custom function.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [46-47]

-"pre_process_function": "connector.pre_process.bedrock.embedding",
-"post_process_function": "connector.post_process.bedrock.embedding"
+"pre_process_function": "connector.pre_process.default.embedding",
+"post_process_function": "connector.post_process.default.embedding"
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding are Bedrock-specific functions being used for a Yandex Cloud connector. However, the note on line 63 explicitly states "Using these particular pre/post processing functions is crucial," suggesting this may be intentional. The improved_code uses default.embedding functions which may not be correct either without verification of the actual Yandex Cloud API response format.

Low
General
Align request body parameter name with pre-process function

The inputText parameter name used in request_body must match the parameter name used
in the predict request. In the test inference section (step 4), the parameter is
also called inputText, which is consistent. However, the pre-process function for
Bedrock embedding typically expects an input field, not inputText. If using a custom
or default pre-process function, ensure the parameter name aligns with what the
function injects.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [45]

-"request_body": "{ \"text\": \"${parameters.inputText}\", \"modelUri\": \"${parameters.modelUri}\" }",
+"request_body": "{ \"text\": \"${parameters.input}\", \"modelUri\": \"${parameters.modelUri}\" }",
Suggestion importance[1-10]: 3

__

Why: The suggestion points out a potential inconsistency between inputText in the request body and what a pre-process function might inject. However, the document consistently uses inputText in both the connector definition and the test inference step (line 114), and the note explicitly says these processing functions are crucial, suggesting the current naming may be intentional for this specific integration.

Low
Suggestions up to commit 59496d6
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix mismatched pre/post processing functions

The connector is for Yandex Cloud embeddings, but the pre/post processing functions
reference bedrock (AWS Bedrock). This is likely incorrect and may cause the
connector to fail or produce wrong results. Verify whether Yandex Cloud's embedding
API response format matches Bedrock's, and if not, use the appropriate processing
functions (e.g., connector.pre_process.default.embedding and
connector.post_process.default.embedding).

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [46-47]

-"pre_process_function": "connector.pre_process.bedrock.embedding",
-"post_process_function": "connector.post_process.bedrock.embedding"
+"pre_process_function": "connector.pre_process.default.embedding",
+"post_process_function": "connector.post_process.default.embedding"
Suggestion importance[1-10]: 7

__

Why: The pre_process_function and post_process_function reference bedrock which is AWS-specific, while this connector is for Yandex Cloud. The note on line 63 explicitly states "Using these particular pre/post processing functions is crucial," suggesting intentional use, but the mismatch with Bedrock is suspicious and could cause incorrect behavior if the API response formats differ.

Medium
Suggestions up to commit ce3d129
CategorySuggestion                                                                                                                                    Impact
Possible issue
Mismatched pre/post processing functions used

The connector is using Bedrock-specific pre/post processing functions for a Yandex
Cloud embedding connector. Since Yandex Cloud's API response format differs from AWS
Bedrock's, using connector.pre_process.bedrock.embedding and
connector.post_process.bedrock.embedding may cause incorrect data parsing or
failures. Verify that these functions are compatible with Yandex Cloud's
request/response format, or use the appropriate generic/custom processing functions.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [46-47]

-"pre_process_function": "connector.pre_process.bedrock.embedding",
-"post_process_function": "connector.post_process.bedrock.embedding"
+"pre_process_function": "connector.pre_process.default.embedding",
+"post_process_function": "connector.post_process.default.embedding"
Suggestion importance[1-10]: 6

__

Why: The use of connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding for a Yandex Cloud connector is potentially incorrect since Yandex Cloud's API format differs from AWS Bedrock's. The document itself notes "Using these particular pre/post processing functions is crucial," which suggests intentional use, but this warrants verification. The improved_code differs from existing_code, making this a valid suggestion worth investigating.

Low
Security
Input text not escaped in JSON request body

The inputText parameter in the request body is directly interpolated without any
escaping. If the input text contains special characters such as quotes or
backslashes, it could break the JSON structure. Consider using a pre-process
function or ensuring proper escaping is applied to the input text.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [45]

+"request_body": "{ \"text\": \"${parameters.inputText}\", \"modelUri\": \"${parameters.modelUri}\" }",
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, meaning no actual fix is proposed. While the concern about JSON injection via unescaped inputText is valid in principle, the suggestion provides no concrete improvement to the code.

Low
Suggestions up to commit 0f8ac3d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect pre/post processing functions

The connector is using Bedrock-specific pre/post processing functions for a Yandex
Cloud embedding connector. Yandex Cloud's textEmbedding API returns a response in a
different format than Amazon Bedrock, so using
connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding
will likely fail or produce incorrect results. You should use the appropriate
pre/post processing functions that match the Yandex Cloud API response format, or
provide a custom Painless script.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [46-47]

-"pre_process_function": "connector.pre_process.bedrock.embedding",
-"post_process_function": "connector.post_process.bedrock.embedding"
+"pre_process_function": "connector.pre_process.default.embedding",
+"post_process_function": "connector.post_process.default.embedding"
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that using connector.pre_process.bedrock.embedding and connector.post_process.bedrock.embedding for a Yandex Cloud connector may be incorrect since the API response formats differ. However, the improved_code uses default.embedding functions which may also not be correct for Yandex Cloud's specific response format, making this a partial fix at best.

Low
General
Verify input parameter name consistency

The inputText parameter name used in request_body does not match the parameter name
used in the predict request body in step 4. While inputText is used in the predict
call, it should be consistent and clearly documented. More importantly, the
pre-process function for Bedrock embedding typically maps text_docs to the input, so
if a custom pre-process is used, the parameter name must align with what the
function expects.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [45]

+"request_body": "{ \"text\": \"${parameters.inputText}\", \"modelUri\": \"${parameters.modelUri}\" }",
 
-
Suggestion importance[1-10]: 1

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The suggestion only asks to verify consistency without providing a concrete fix.

Low
Suggestions up to commit f22dac1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect Bedrock processing functions for Yandex

The pre/post processing functions reference Bedrock-specific functions
(connector.pre_process.bedrock.embedding and
connector.post_process.bedrock.embedding), but this connector is for Yandex Cloud.
The Bedrock embedding pre/post processing functions expect a specific
request/response format tied to AWS Bedrock's API, which likely differs from Yandex
Cloud's API format. These should be replaced with the correct Yandex Cloud-specific
processing functions or custom ones that match the Yandex Cloud embedding API's
input/output schema.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [45-46]

-"pre_process_function": "connector.pre_process.bedrock.embedding",
-"post_process_function": "connector.post_process.bedrock.embedding"
+"pre_process_function": "connector.pre_process.default.embedding",
+"post_process_function": "connector.post_process.default.embedding"
Suggestion importance[1-10]: 6

__

Why: The suggestion raises a valid concern about using Bedrock-specific pre/post processing functions for a Yandex Cloud connector. However, the README note explicitly states "Using these particular pre/post processing functions is crucial," suggesting this may be intentional. The improved_code proposes default.embedding functions without strong evidence these are correct for Yandex Cloud's API format.

Low
General
Clarify duplicate placeholder dependency in parameters

The <folder_ID> placeholder appears twice and must be kept in sync manually, which is
error-prone. Since modelUri already embeds the folder ID, users may forget to update
both occurrences. Consider adding a clear note or restructuring the example to
highlight this dependency, or use a single placeholder reference.

docs/remote_inference_blueprints/yandexcloud_connector_embedding_legacy_blueprint.md [29-30]

-"modelUri": "emb://<folder_ID>/text-search-<doc|query>/latest",
-"folder_id":"<folder_ID>"
+"modelUri": "emb://<folder_ID>/text-search-doc/latest",
+"folder_id": "<folder_ID>"
Suggestion importance[1-10]: 2

__

Why: The suggestion about duplicate <folder_ID> placeholders is a minor documentation concern, but the improved_code only changes <doc|query> to doc and adds a space, which doesn't actually address the stated problem of keeping placeholders in sync. The improvement is minimal and doesn't resolve the core concern.

Low

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit f22dac1

}
```

Note: Replace all `<placeholders>` in the preceding code snippet with appropriate values, while preserving `${curly braces}` syntax exactly as shown. Short-lived [bearer tokens](https://yandex.cloud/en/docs/iam/concepts/authorization/iam-token) (valid ~12 hours) may be used as an alternative to [API keys](https://yandex.cloud/en/docs/iam/concepts/authorization/api-key). API keys must be granted either `yc.ai.languageModels.execute` or `yc.ai.foundationModels.execute` roles. Also refer to [the guide](https://yandex.cloud/en/docs/ai-studio/security/). Additionally, due to distinct [models](https://yandex.cloud/en/docs/ai-studio/concepts/embeddings) being employed for query processing versus document processing, two dedicated connectors are required. Using these particular pre/post processing functions is crucial.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! One suggestion: the note mentions that two dedicated connectors are required (one for text-search-doc and one for text-search-query), but the blueprint only walks through creating a
single connector. Since this is meant to be a step-by-step guide, could we either:

  1. Show both connector configs explicitly (e.g., "Step 2a: Create connector for document embedding" and "Step 2b: Create connector for query embedding"), or
  2. At minimum, move the two-connector requirement out of the note paragraph and into a clearly labeled step

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right. I’d prefer not to repeat nearly identical JSON snippets and turn it into a “spot the difference” exercise, so I tried to balance explicitness with verbosity.

Do you think the current level of detail is sufficient?

Also, would you be OK with using “2d” and “2q” as sub-step labels? They’re ordered and (I think) self-explanatory.

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 0f8ac3d

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit ce3d129

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 59496d6

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

add Yandex Cloud embeddings connector blueprint

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

Persistent review updated to latest commit c45bb98

@mkhludnev
Copy link
Copy Markdown
Author

@dhrubo-os may I ask your attention?

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Persistent review updated to latest commit a79f3bb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants