You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gateway/1.12/modules/ROOT/pages/flex-gateway-llm-proxy-create-llm-proxy.adoc
+24-56Lines changed: 24 additions & 56 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ NOTE: A large Flex Gateway supports up to 50 LLM Proxies.
16
16
See xref:flex-gateway-managed-set-up.adoc[].
17
17
. Ensure you have the API Manager *API Creator* permission.
18
18
. Retrieve your API keys from your LLM Providers.
19
+
. xref:flex-gateway-llm-proxy-semantic-service.adoc[Configure a semantic service] if you want to use semantic routing.
19
20
20
21
[[create-an-llm-proxy]]
21
22
== Create an LLM Proxy
@@ -36,6 +37,7 @@ See xref:flex-gateway-managed-set-up.adoc[].
36
37
.. Select your *LLM Provider*.
37
38
.. Ensure the *URL* for your provider is correct. Edit if necessary.
38
39
.. Configure access details for the provider endpoint.
40
+
.. Select a *Static* or *Dynamic* API Key. If selecting *Dynamic* API Key, define a DataWeave script to extract the API Key from the incoming request.
39
41
.. Select a *Target Model* to override the model version specified in the payload. Selecting *Not Applicable* sends the request to the specified model. A *Target Model* is required for semantic routing.
40
42
+
41
43
[NOTE]
@@ -45,9 +47,7 @@ To configure a target model for Amazon Bedrock Claude Modes, you must enter the
45
47
To learn how to find the model ID, see xref:flex-gateway-llm-proxy-request.adoc#amazon-bedrock-model-names[Amazon Bedrock Model Names].
46
48
====
47
49
48
-
.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route.
49
-
+
50
-
NOTE: Each LLM Provider can support one route.
50
+
.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route. Each LLM Provider supports one route.
51
51
. If adding multiple routes, select a *Routing strategy*. To configure your routing strategy, see:
52
52
.. <<configure-model-based-routing>>
53
53
.. <<configure-semantic-routing>>
@@ -68,71 +68,39 @@ NOTE: Each LLM Provider can support one route.
68
68
[[configure-semantic-routing]]
69
69
== Configure Semantic Routing
70
70
71
-
For semantic routing, define and apply prompt topics to each route. Define deny list topics to block certain requests.
72
-
73
71
To configure semantic routing:
74
72
75
-
. Configure multiple routes. Click *Add LLM Route* to create new routes.
73
+
. Ensure you have already xref:flex-gateway-llm-proxy-semantic-service.adoc[Configured a semantic service].
74
+
. Configure multiple routes and select a target model for each route. Click *Add LLM Route* to create new routes.
76
75
. Select *Semantic* for *Routing strategy*.
77
-
. If you haven't already, click *Configure Semantic Service*.
78
-
+
79
-
To create a semantic service, see <<create-a-semantic-service>>.
80
-
. Select a *Target Model* for each route.
81
-
. Define a prompt topics for the routes:
82
-
.. Click the *Select prompt topics*.
83
-
.. Click *+ Create prompt topic*.
84
-
.. Define a *Prompt topic name*.
85
-
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
86
-
.. Click *Create*.
87
-
.. Create multiple prompt topics for each route as needed.
76
+
. Click *Select a service* and select a service.
77
+
. Define or select a prompt topics for the routes:
78
+
** Advanced scale semantic service:
79
+
... Select prompt topics from your predefined prompt topics.
80
+
** Basic scale semantic service:
81
+
... Click the *Select prompt topics*.
82
+
... Click *+ Create prompt topic*.
83
+
... Define a *Prompt topic name*.
84
+
... Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
85
+
... Click *Create*.
86
+
... Create multiple prompt topics for each route as needed.
88
87
. Configure a *Fallback route* for the request to be sent to if it doesn't match a semantic route:
89
88
.. Specify an accuracy threshold. When the accuracy of the semantic match is less than this threshold, traffic is sent to the fallback route.
90
89
.. Select a *Route* to fallback to.
91
90
.. Select a *Target model* for the fallback route to use.
92
91
. Create a *Semantic prompt guard* to block users from asking the server about specific topics:
93
-
.. Click *+ Create deny list*.
94
-
.. Define a *Prompt topic name*.
95
-
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
96
-
.. Click *Create*.
97
-
.. Create multiple deny list topics to better protect your LLM Proxy.
92
+
** Advanced scale semantic service:
93
+
... Select topics from your predefined prompt topics.
94
+
** Basic scale semantic service:
95
+
... Click *+ Create deny list*.
96
+
... Define a *Prompt topic name*.
97
+
... Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
98
+
... Click *Create*.
99
+
... Create multiple prompt topics for each route as needed.
98
100
+
99
101
NOTE: Creating a semantic prompt guard automatically applies the Semantic Prompt Guard policy.
100
102
. Return to <<create-an-llm-proxy>> step 7 to finish configuring your LLM Proxy.
101
103
102
-
=== Semantic Routing Limits
103
-
104
-
[%header%autowidth.spread,cols="a,a"]
105
-
|===
106
-
| Limit | Value
107
-
| Prompt topics (across all routes of an LLM Proxy) | 6
108
-
| Utterances per prompt topic | 10
109
-
| Deny list topics | 6
110
-
| Utterances per deny list topic | 10
111
-
| Maximum characters per utterance | 500
112
-
|===
113
-
114
-
[[create-a-semantic-service]]
115
-
=== Create and Edit a Semantic Service
116
-
117
-
A semantic service compares the request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests. Only one semantic service is support for each environment.
118
-
119
-
To define a semantic service:
120
-
121
-
. From API Manager, click *Semantic Service Setup*.
122
-
. Click *+ Create Semantic Service*.
123
-
. Configure the semantic service parameters:
124
-
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
125
-
** *URL*: The URL of the embedding service.
126
-
** *Model*: The embedding model to use.
127
-
** *Auth key*: The API authentication key for the embedding service.
128
-
. Click *Deploy*.
129
-
130
-
To edit a semantic service:
131
-
132
-
. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
A semantic service compares incoming request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests.
8
+
9
+
LLM Proxy supports two types of semantic services:
10
+
11
+
* <<configure-an-advanced-scale-semantic-service, Advanced Scale>>: For complex semantic routing. Advanced scale semantic services use a vector database to store and compare prompt topic utterances. Advanced scale semantic services support unlimited prompt topics and 2000 utterances per prompt topic.
12
+
* <<configure-a-basic-scale-semantic-service, Basic Scale>>: For simple semantic routing and blocking. Basic scale semantic services support up to 6 prompt topics and 10 utterances per prompt topic.
13
+
14
+
[[configure-an-advanced-scale-semantic-service]]
15
+
== Configure an Advanced Scale Semantic Service
16
+
17
+
. From API Manager, click *Semantic Service Configuration*.
18
+
. Click *+ Create a Semantic Service Configuration*.
19
+
. Select *Advanced Scale*.
20
+
. Configure the semantic service parameters:
21
+
** *Service label*: Label to identify the new service.
22
+
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
23
+
** *URL*: The URL of the embedding service.
24
+
** *Model*: The embedding model to use.
25
+
** *Auth key*: The API authentication key for the embedding service.
26
+
. Click *Vector connection*.
27
+
. Select a *Vector Database Provider* from these options:
28
+
** *Quadrant*
29
+
** *Pinecone*
30
+
** *Azure AI Search*
31
+
. Configure the parameters to connect your database.
32
+
. Create prompt topics:
33
+
.. Click *Create prompt topics*.
34
+
.. Define a *Prompt topic name*.
35
+
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
36
+
.. Create as many prompt topics as neccesary. You can also create new prompt topics later by editing the semantic service.
37
+
+
38
+
NOTE: To deny users from asking about certain subject, create prompt topics for the subjects and apply them as deny list topics when configuring your LLM Proxy.
39
+
. Click *Save & download script*.
40
+
. Open the downloaded `.sh` script file in you database to populate it with your scaled vectors.
41
+
42
+
[[configure-a-basic-scale-semantic-service]]
43
+
== Configure a Basic Scale Semantic Service
44
+
45
+
. From API Manager, click *Semantic Service Configuration*.
46
+
. Click *+ Create a Semantic Service Configuration*.
47
+
. Select *Basic Scale*.
48
+
. Configure the semantic service parameters:
49
+
** *Service label*: Label to identify the new service.
50
+
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
51
+
** *URL*: The URL of the embedding service.
52
+
** *Model*: The embedding model to use.
53
+
** *Auth key*: The API authentication key for the embedding service.
54
+
. Click *Deploy*.
55
+
56
+
[[basic-scale-semantic-service-routing-limits]]
57
+
=== Basic Scale Semantic Service Routing Limits
58
+
59
+
[%header%autowidth.spread,cols="a,a"]
60
+
|===
61
+
| Limit | Value
62
+
| Prompt topics (across all routes of an LLM Proxy) | 6
63
+
| Utterances per prompt topic | 10
64
+
| Deny list topics | 6
65
+
| Utterances per deny list topic | 10
66
+
| Maximum characters per utterance | 500
67
+
|===
68
+
69
+
== Edit a Semantic Service
70
+
71
+
To edit a semantic service:
72
+
73
+
. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
74
+
. Make the necessary edits.
75
+
. Click *Redeploy*.
76
+
77
+
=== Redownload Vector Script
78
+
79
+
If creating new prompt topics, it is necessary to redownload and run the vector script in your database again:
80
+
81
+
. From *Semantic Service Configuration*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the advanced scale semantic service whose script you want to download.
0 commit comments