Skip to content

Commit 9a55fa6

Browse files
Merge pull request #1060 from mulesoft/W-22053202-advanced-semantic-service-gr
W-22053202 advanced semantic service gr
2 parents 9473735 + affd1c3 commit 9a55fa6

3 files changed

Lines changed: 107 additions & 56 deletions

File tree

gateway/1.12/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
// LLM Proxy
3434
* xref:flex-gateway-llm-proxy.adoc[]
3535
** xref:flex-gateway-llm-proxy-create-llm-proxy.adoc[]
36+
** xref:flex-gateway-llm-proxy-semantic-service.adoc[]
3637
** xref:flex-gateway-llm-proxy-request.adoc[]
3738
** xref:flex-gateway-llm-proxy-try-out.adoc[]
3839
** xref:flex-gateway-llm-proxy-token-reports.adoc[]

gateway/1.12/modules/ROOT/pages/flex-gateway-llm-proxy-create-llm-proxy.adoc

Lines changed: 24 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ NOTE: A large Flex Gateway supports up to 50 LLM Proxies.
1616
See xref:flex-gateway-managed-set-up.adoc[].
1717
. Ensure you have the API Manager *API Creator* permission.
1818
. Retrieve your API keys from your LLM Providers.
19+
. xref:flex-gateway-llm-proxy-semantic-service.adoc[Configure a semantic service] if you want to use semantic routing.
1920

2021
[[create-an-llm-proxy]]
2122
== Create an LLM Proxy
@@ -36,6 +37,7 @@ See xref:flex-gateway-managed-set-up.adoc[].
3637
.. Select your *LLM Provider*.
3738
.. Ensure the *URL* for your provider is correct. Edit if necessary.
3839
.. Configure access details for the provider endpoint.
40+
.. Select a *Static* or *Dynamic* API Key. If selecting *Dynamic* API Key, define a DataWeave script to extract the API Key from the incoming request.
3941
.. Select a *Target Model* to override the model version specified in the payload. Selecting *Not Applicable* sends the request to the specified model. A *Target Model* is required for semantic routing.
4042
+
4143
[NOTE]
@@ -45,9 +47,7 @@ To configure a target model for Amazon Bedrock Claude Modes, you must enter the
4547
To learn how to find the model ID, see xref:flex-gateway-llm-proxy-request.adoc#amazon-bedrock-model-names[Amazon Bedrock Model Names].
4648
====
4749

48-
.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route.
49-
+
50-
NOTE: Each LLM Provider can support one route.
50+
.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route. Each LLM Provider supports one route.
5151
. If adding multiple routes, select a *Routing strategy*. To configure your routing strategy, see:
5252
.. <<configure-model-based-routing>>
5353
.. <<configure-semantic-routing>>
@@ -68,71 +68,39 @@ NOTE: Each LLM Provider can support one route.
6868
[[configure-semantic-routing]]
6969
== Configure Semantic Routing
7070

71-
For semantic routing, define and apply prompt topics to each route. Define deny list topics to block certain requests.
72-
7371
To configure semantic routing:
7472

75-
. Configure multiple routes. Click *Add LLM Route* to create new routes.
73+
. Ensure you have already xref:flex-gateway-llm-proxy-semantic-service.adoc[Configured a semantic service].
74+
. Configure multiple routes and select a target model for each route. Click *Add LLM Route* to create new routes.
7675
. Select *Semantic* for *Routing strategy*.
77-
. If you haven't already, click *Configure Semantic Service*.
78-
+
79-
To create a semantic service, see <<create-a-semantic-service>>.
80-
. Select a *Target Model* for each route.
81-
. Define a prompt topics for the routes:
82-
.. Click the *Select prompt topics*.
83-
.. Click *+ Create prompt topic*.
84-
.. Define a *Prompt topic name*.
85-
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
86-
.. Click *Create*.
87-
.. Create multiple prompt topics for each route as needed.
76+
. Click *Select a service* and select a service.
77+
. Define or select a prompt topics for the routes:
78+
** Advanced scale semantic service:
79+
... Select prompt topics from your predefined prompt topics.
80+
** Basic scale semantic service:
81+
... Click the *Select prompt topics*.
82+
... Click *+ Create prompt topic*.
83+
... Define a *Prompt topic name*.
84+
... Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
85+
... Click *Create*.
86+
... Create multiple prompt topics for each route as needed.
8887
. Configure a *Fallback route* for the request to be sent to if it doesn't match a semantic route:
8988
.. Specify an accuracy threshold. When the accuracy of the semantic match is less than this threshold, traffic is sent to the fallback route.
9089
.. Select a *Route* to fallback to.
9190
.. Select a *Target model* for the fallback route to use.
9291
. Create a *Semantic prompt guard* to block users from asking the server about specific topics:
93-
.. Click *+ Create deny list*.
94-
.. Define a *Prompt topic name*.
95-
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
96-
.. Click *Create*.
97-
.. Create multiple deny list topics to better protect your LLM Proxy.
92+
** Advanced scale semantic service:
93+
... Select topics from your predefined prompt topics.
94+
** Basic scale semantic service:
95+
... Click *+ Create deny list*.
96+
... Define a *Prompt topic name*.
97+
... Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
98+
... Click *Create*.
99+
... Create multiple prompt topics for each route as needed.
98100
+
99101
NOTE: Creating a semantic prompt guard automatically applies the Semantic Prompt Guard policy.
100102
. Return to <<create-an-llm-proxy>> step 7 to finish configuring your LLM Proxy.
101103

102-
=== Semantic Routing Limits
103-
104-
[%header%autowidth.spread,cols="a,a"]
105-
|===
106-
| Limit | Value
107-
| Prompt topics (across all routes of an LLM Proxy) | 6
108-
| Utterances per prompt topic | 10
109-
| Deny list topics | 6
110-
| Utterances per deny list topic | 10
111-
| Maximum characters per utterance | 500
112-
|===
113-
114-
[[create-a-semantic-service]]
115-
=== Create and Edit a Semantic Service
116-
117-
A semantic service compares the request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests. Only one semantic service is support for each environment.
118-
119-
To define a semantic service:
120-
121-
. From API Manager, click *Semantic Service Setup*.
122-
. Click *+ Create Semantic Service*.
123-
. Configure the semantic service parameters:
124-
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
125-
** *URL*: The URL of the embedding service.
126-
** *Model*: The embedding model to use.
127-
** *Auth key*: The API authentication key for the embedding service.
128-
. Click *Deploy*.
129-
130-
To edit a semantic service:
131-
132-
. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
133-
. Make the necessary edits.
134-
. Click *Redeploy*.
135-
136104
== Edit and Delete an LLM Proxy
137105

138106
To edit an LLM Proxy:
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
= Configuring a Semantic Service
2+
ifndef::env-site,env-github[]
3+
include::_attributes.adoc[]
4+
endif::[]
5+
:imagesdir: ../assets/images
6+
7+
A semantic service compares incoming request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests.
8+
9+
LLM Proxy supports two types of semantic services:
10+
11+
* <<configure-an-advanced-scale-semantic-service, Advanced Scale>>: For complex semantic routing. Advanced scale semantic services use a vector database to store and compare prompt topic utterances. Advanced scale semantic services support unlimited prompt topics and 2000 utterances per prompt topic.
12+
* <<configure-a-basic-scale-semantic-service, Basic Scale>>: For simple semantic routing and blocking. Basic scale semantic services support up to 6 prompt topics and 10 utterances per prompt topic.
13+
14+
[[configure-an-advanced-scale-semantic-service]]
15+
== Configure an Advanced Scale Semantic Service
16+
17+
. From API Manager, click *Semantic Service Configuration*.
18+
. Click *+ Create a Semantic Service Configuration*.
19+
. Select *Advanced Scale*.
20+
. Configure the semantic service parameters:
21+
** *Service label*: Label to identify the new service.
22+
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
23+
** *URL*: The URL of the embedding service.
24+
** *Model*: The embedding model to use.
25+
** *Auth key*: The API authentication key for the embedding service.
26+
. Click *Vector connection*.
27+
. Select a *Vector Database Provider* from these options:
28+
** *Quadrant*
29+
** *Pinecone*
30+
** *Azure AI Search*
31+
. Configure the parameters to connect your database.
32+
. Create prompt topics:
33+
.. Click *Create prompt topics*.
34+
.. Define a *Prompt topic name*.
35+
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
36+
.. Create as many prompt topics as neccesary. You can also create new prompt topics later by editing the semantic service.
37+
+
38+
NOTE: To deny users from asking about certain subject, create prompt topics for the subjects and apply them as deny list topics when configuring your LLM Proxy.
39+
. Click *Save & download script*.
40+
. Open the downloaded `.sh` script file in you database to populate it with your scaled vectors.
41+
42+
[[configure-a-basic-scale-semantic-service]]
43+
== Configure a Basic Scale Semantic Service
44+
45+
. From API Manager, click *Semantic Service Configuration*.
46+
. Click *+ Create a Semantic Service Configuration*.
47+
. Select *Basic Scale*.
48+
. Configure the semantic service parameters:
49+
** *Service label*: Label to identify the new service.
50+
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
51+
** *URL*: The URL of the embedding service.
52+
** *Model*: The embedding model to use.
53+
** *Auth key*: The API authentication key for the embedding service.
54+
. Click *Deploy*.
55+
56+
[[basic-scale-semantic-service-routing-limits]]
57+
=== Basic Scale Semantic Service Routing Limits
58+
59+
[%header%autowidth.spread,cols="a,a"]
60+
|===
61+
| Limit | Value
62+
| Prompt topics (across all routes of an LLM Proxy) | 6
63+
| Utterances per prompt topic | 10
64+
| Deny list topics | 6
65+
| Utterances per deny list topic | 10
66+
| Maximum characters per utterance | 500
67+
|===
68+
69+
== Edit a Semantic Service
70+
71+
To edit a semantic service:
72+
73+
. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
74+
. Make the necessary edits.
75+
. Click *Redeploy*.
76+
77+
=== Redownload Vector Script
78+
79+
If creating new prompt topics, it is necessary to redownload and run the vector script in your database again:
80+
81+
. From *Semantic Service Configuration*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the advanced scale semantic service whose script you want to download.
82+
. Click *Download script*.

0 commit comments

Comments
 (0)