Merge pull request #1060 from mulesoft/W-22053202-advanced-semantic-service-gr

glenn-rodgers-sf · web-flow · commit 9a55fa650579 · 2026-05-04T11:33:06.000-04:00
W-22053202 advanced semantic service gr
diff --git a/gateway/1.12/modules/ROOT/nav.adoc b/gateway/1.12/modules/ROOT/nav.adoc
@@ -33,6 +33,7 @@
 // LLM Proxy
 * xref:flex-gateway-llm-proxy.adoc[]
 ** xref:flex-gateway-llm-proxy-create-llm-proxy.adoc[]
+** xref:flex-gateway-llm-proxy-semantic-service.adoc[]
 ** xref:flex-gateway-llm-proxy-request.adoc[]
 ** xref:flex-gateway-llm-proxy-try-out.adoc[]
 ** xref:flex-gateway-llm-proxy-token-reports.adoc[]
diff --git a/gateway/1.12/modules/ROOT/pages/flex-gateway-llm-proxy-create-llm-proxy.adoc b/gateway/1.12/modules/ROOT/pages/flex-gateway-llm-proxy-create-llm-proxy.adoc
@@ -16,6 +16,7 @@ NOTE: A large Flex Gateway supports up to 50 LLM Proxies.
 See xref:flex-gateway-managed-set-up.adoc[].
 . Ensure you have the API Manager *API Creator* permission.
 . Retrieve your API keys from your LLM Providers.
+. xref:flex-gateway-llm-proxy-semantic-service.adoc[Configure a semantic service] if you want to use semantic routing.
 
 [[create-an-llm-proxy]]
 == Create an LLM Proxy
@@ -36,6 +37,7 @@ See xref:flex-gateway-managed-set-up.adoc[].
 .. Select your *LLM Provider*.
 .. Ensure the *URL* for your provider is correct. Edit if necessary.
 .. Configure access details for the provider endpoint.
+.. Select a *Static* or *Dynamic* API Key. If selecting *Dynamic* API Key, define a DataWeave script to extract the API Key from the incoming request.
 .. Select a *Target Model* to override the model version specified in the payload. Selecting *Not Applicable* sends the request to the specified model. A *Target Model* is required for semantic routing.
 +
 [NOTE]
@@ -45,9 +47,7 @@ To configure a target model for Amazon Bedrock Claude Modes, you must enter the
 To learn how to find the model ID, see xref:flex-gateway-llm-proxy-request.adoc#amazon-bedrock-model-names[Amazon Bedrock Model Names].
 ====
 
-.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route.
-+
-NOTE: Each LLM Provider can support one route.
+.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route. Each LLM Provider supports one route.
 . If adding multiple routes, select a *Routing strategy*. To configure your routing strategy, see:
 .. <<configure-model-based-routing>>
 .. <<configure-semantic-routing>>
@@ -68,71 +68,39 @@ NOTE: Each LLM Provider can support one route.
 [[configure-semantic-routing]]
 == Configure Semantic Routing
 
-For semantic routing, define and apply prompt topics to each route. Define deny list topics to block certain requests.
-
 To configure semantic routing:
 
-. Configure multiple routes. Click *Add LLM Route* to create new routes.
+. Ensure you have already xref:flex-gateway-llm-proxy-semantic-service.adoc[Configured a semantic service].
+. Configure multiple routes and select a target model for each route. Click *Add LLM Route* to create new routes.
 . Select *Semantic* for *Routing strategy*.
-. If you haven't already, click *Configure Semantic Service*.
-+
-To create a semantic service, see <<create-a-semantic-service>>.
-. Select a *Target Model* for each route.
-. Define a prompt topics for the routes:
-.. Click the *Select prompt topics*.
-.. Click *+ Create prompt topic*.
-.. Define a *Prompt topic name*.
-.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
-.. Click *Create*.
-.. Create multiple prompt topics for each route as needed.
+. Click *Select a service* and select a service.
+. Define or select a prompt topics for the routes:
+** Advanced scale semantic service:
+... Select prompt topics from your predefined prompt topics.
+** Basic scale semantic service:
+... Click the *Select prompt topics*.
+... Click *+ Create prompt topic*.
+... Define a *Prompt topic name*.
+... Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
+... Click *Create*.
+... Create multiple prompt topics for each route as needed.
 . Configure a *Fallback route* for the request to be sent to if it doesn't match a semantic route:
 .. Specify an accuracy threshold. When the accuracy of the semantic match is less than this threshold, traffic is sent to the fallback route.
 .. Select a *Route* to fallback to.
 .. Select a *Target model* for the fallback route to use.
 . Create a *Semantic prompt guard* to block users from asking the server about specific topics:
-.. Click *+ Create deny list*.
-.. Define a *Prompt topic name*.
-.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
-.. Click *Create*.
-.. Create multiple deny list topics to better protect your LLM Proxy.
+** Advanced scale semantic service:
+... Select topics from your predefined prompt topics.
+** Basic scale semantic service:
+... Click *+ Create deny list*.
+... Define a *Prompt topic name*.
+... Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
+... Click *Create*.
+... Create multiple prompt topics for each route as needed.
 +
 NOTE: Creating a semantic prompt guard automatically applies the Semantic Prompt Guard policy.
 . Return to <<create-an-llm-proxy>> step 7 to finish configuring your LLM Proxy.
 
-=== Semantic Routing Limits
-
-[%header%autowidth.spread,cols="a,a"]
-|===
-| Limit | Value
-| Prompt topics (across all routes of an LLM Proxy) | 6
-| Utterances per prompt topic | 10
-| Deny list topics | 6
-| Utterances per deny list topic | 10
-| Maximum characters per utterance | 500
-|===
-
-[[create-a-semantic-service]]
-=== Create and Edit a Semantic Service
-
-A semantic service compares the request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests. Only one semantic service is support for each environment.
-
-To define a semantic service:
-
-. From API Manager, click *Semantic Service Setup*.
-. Click *+ Create Semantic Service*.
-. Configure the semantic service parameters:
-** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
-** *URL*: The URL of the embedding service.
-** *Model*: The embedding model to use.
-** *Auth key*: The API authentication key for the embedding service.
-. Click *Deploy*.
-
-To edit a semantic service:
-
-. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
-. Make the necessary edits.
-. Click *Redeploy*.
-
 == Edit and Delete an LLM Proxy
 
 To edit an LLM Proxy:
diff --git a/gateway/1.12/modules/ROOT/pages/flex-gateway-llm-proxy-semantic-service.adoc b/gateway/1.12/modules/ROOT/pages/flex-gateway-llm-proxy-semantic-service.adoc
@@ -0,0 +1,82 @@
+= Configuring a Semantic Service
+ifndef::env-site,env-github[]
+include::_attributes.adoc[]
+endif::[]
+:imagesdir: ../assets/images
+
+A semantic service compares incoming request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests.
+
+LLM Proxy supports two types of semantic services:
+
+* <<configure-an-advanced-scale-semantic-service, Advanced Scale>>: For complex semantic routing. Advanced scale semantic services use a vector database to store and compare prompt topic utterances. Advanced scale semantic services support unlimited prompt topics and 2000 utterances per prompt topic.
+* <<configure-a-basic-scale-semantic-service, Basic Scale>>: For simple semantic routing and blocking. Basic scale semantic services support up to 6 prompt topics and 10 utterances per prompt topic.
+
+[[configure-an-advanced-scale-semantic-service]]
+== Configure an Advanced Scale Semantic Service
+
+. From API Manager, click *Semantic Service Configuration*.
+. Click *+ Create a Semantic Service Configuration*.
+. Select *Advanced Scale*.
+. Configure the semantic service parameters:
+** *Service label*: Label to identify the new service.
+** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
+** *URL*: The URL of the embedding service.
+** *Model*: The embedding model to use.
+** *Auth key*: The API authentication key for the embedding service.
+. Click *Vector connection*.
+. Select a *Vector Database Provider* from these options:
+** *Quadrant*
+** *Pinecone*
+** *Azure AI Search*
+. Configure the parameters to connect your database.
+. Create prompt topics:
+.. Click *Create prompt topics*.
+.. Define a *Prompt topic name*.
+.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
+.. Create as many prompt topics as neccesary. You can also create new prompt topics later by editing the semantic service.
++
+NOTE: To deny users from asking about certain subject, create prompt topics for the subjects and apply them as deny list topics when configuring your LLM Proxy.  
+. Click *Save & download script*.
+. Open the downloaded `.sh` script file in you database to populate it with your scaled vectors.
+
+[[configure-a-basic-scale-semantic-service]]
+== Configure a Basic Scale Semantic Service
+
+. From API Manager, click *Semantic Service Configuration*.
+. Click *+ Create a Semantic Service Configuration*.
+. Select *Basic Scale*.
+. Configure the semantic service parameters:
+** *Service label*: Label to identify the new service.
+** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
+** *URL*: The URL of the embedding service.
+** *Model*: The embedding model to use.
+** *Auth key*: The API authentication key for the embedding service.
+. Click *Deploy*.
+
+[[basic-scale-semantic-service-routing-limits]]
+=== Basic Scale Semantic Service Routing Limits
+
+[%header%autowidth.spread,cols="a,a"]
+|===
+| Limit | Value
+| Prompt topics (across all routes of an LLM Proxy) | 6
+| Utterances per prompt topic | 10
+| Deny list topics | 6
+| Utterances per deny list topic | 10
+| Maximum characters per utterance | 500
+|===
+
+== Edit a Semantic Service
+
+To edit a semantic service:
+
+. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
+. Make the necessary edits.
+. Click *Redeploy*.
+
+=== Redownload Vector Script
+
+If creating new prompt topics, it is necessary to redownload and run the vector script in your database again:
+
+. From *Semantic Service Configuration*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the advanced scale semantic service whose script you want to download.
+. Click *Download script*.