Skip to content

Commit 9f91cdd

Browse files
JedrzejJanasiakjgiovaresco
authored andcommitted
feat: implementation of AI prompt guard rails policy
https://gravitee.atlassian.net/browse/APIM-8987
1 parent 3edd9d2 commit 9f91cdd

29 files changed

+1760
-25
lines changed

.circleci/config.yml

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,32 @@ version: 2.1
33
setup: true
44

55
orbs:
6-
gravitee: gravitee-io/gravitee@5.2.0
6+
gravitee: gravitee-io/gravitee@5.2.0
77

88
# our single workflow, that triggers the setup job defined above, filters on tag and branches are needed otherwise
99
# some workflow and job will not be triggered for tags (default CircleCI behavior)
1010
workflows:
11-
setup_build:
12-
when:
13-
not: << pipeline.git.tag >>
14-
jobs:
15-
- gravitee/setup_plugin-build-config:
16-
filters:
17-
tags:
18-
ignore:
19-
- /.*/
11+
setup_build:
12+
when:
13+
not: << pipeline.git.tag >>
14+
jobs:
15+
- gravitee/setup_plugin-build-config:
16+
filters:
17+
tags:
18+
ignore:
19+
- /.*/
2020

21-
setup_release:
22-
when:
23-
matches:
24-
pattern: "/^[0-9]+\\.[0-9]+\\.[0-9]+(-(alpha|beta|rc)\\.[0-9]+)?$/"
25-
value: << pipeline.git.tag >>
26-
jobs:
27-
- gravitee/setup_plugin-release-config:
28-
filters:
29-
branches:
30-
ignore:
31-
- /.*/
32-
tags:
33-
only:
34-
- /^[0-9]+\.[0-9]+\.[0-9]+(-(alpha|beta|rc)\.[0-9]+)?$/
21+
setup_release:
22+
when:
23+
matches:
24+
pattern: "/^[0-9]+\\.[0-9]+\\.[0-9]+(-(alpha|beta|rc)\\.[0-9]+)?$/"
25+
value: << pipeline.git.tag >>
26+
jobs:
27+
- gravitee/setup_plugin-release-config:
28+
filters:
29+
branches:
30+
ignore:
31+
- /.*/
32+
tags:
33+
only:
34+
- /^[0-9]+\.[0-9]+\.[0-9]+(-(alpha|beta|rc)\.[0-9]+)?$/

.docgen/examples.yaml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
genExamples: []
2+
rawExamples:
3+
- title: Only log the request when inappropriate prompt detected
4+
templateRef: v4-api-proxy-with-resource
5+
language: json
6+
properties:
7+
phase: request
8+
resource: |
9+
{
10+
"name": "ai-model-text-classification-resource",
11+
"type": "ai-model-text-classification",
12+
"configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
13+
"enabled": true
14+
}
15+
file: .docgen/examples/log-request-only.json
16+
- title: Block request when inappropriate prompt detected
17+
templateRef: v4-api-proxy-with-resource
18+
language: json
19+
properties:
20+
phase: request
21+
resource: |
22+
{
23+
"name": "ai-model-text-classification-resource",
24+
"type": "ai-model-text-classification",
25+
"configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
26+
"enabled": true
27+
}
28+
file: .docgen/examples/block-request.json
29+
- title: Provide a custom sensitivity threshold for inappropriate prompts
30+
templateRef: v4-api-proxy-with-resource
31+
language: json
32+
properties:
33+
phase: request
34+
resource: |
35+
{
36+
"name": "ai-model-text-classification-resource",
37+
"type": "ai-model-text-classification",
38+
"configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
39+
"enabled": true
40+
}
41+
file: .docgen/examples/provide-custom-sensitivity-threshold.json
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"resourceName": "ai-model-text-classification-resource",
3+
"promptLocation": "{#request.jsonContent.prompt}",
4+
"contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
5+
"requestPolicy": "BLOCK_REQUEST"
6+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"resourceName": "ai-model-text-classification-resource",
3+
"promptLocation": "{#request.jsonContent.prompt}",
4+
"contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
5+
"requestPolicy": "LOG_REQUEST"
6+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"resourceName": "ai-model-text-classification-resource",
3+
"promptLocation": "{#request.jsonContent.prompt}",
4+
"sensitivityThreshold": 0.1,
5+
"contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
6+
"requestPolicy": "BLOCK_REQUEST"
7+
}

.docgen/matrix.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
rows:
2+
- data:
3+
plugin: "1.0.0 and after"
4+
apim: "4.8.x and after"
5+
java: 21

.docgen/overview.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
This policy uses an AI-powered text classification model to evaluate user prompts for potentially inappropriate or malicious content. It can detect a wide range of violations, such as profanity, sexually explicit language, harmful intent, and jailbreak prompt injections, which are adversarial inputs crafted to bypass AI safety mechanisms.
2+
3+
Depending on configuration, when a prompt is flagged:
4+
5+
* **Blocked and flagged** – the request is denied at the gateway
6+
* **Allowed but flagged** – the request proceeds but is logged for monitoring
7+
8+
>**_NOTE_**: You may face an error when using this policy using the Gravitee's docker image. This is due to the fact that the default image are based on Alpine Linux, which does not support the ONNX Runtime. To resolve this issue you need to use the Gravitee's docker image based on Debian, which is available at `graviteeio/apim-gateway:4.8.0-debian`.
9+
10+
## Content Checks
11+
12+
The Content Checks property specifies the classification labels that are applied to evaluate prompts. You should choose Labels in alignment with the selected model's capabilities and the intended filtering goals. For example, filtering for profanity while omitting toxicity checks.
13+
14+
Supported labels are documented in the model’s card or configuration file.
15+
16+
17+
## AI Model Resource
18+
19+
The policy requires an **AI Model Text Classification Resource** to be defined at the API level. This resource serves as the classification engine for evaluating prompts' content during policy execution.
20+
21+
For more information about creating and managing resources, go to [Resources](https://documentation.gravitee.io/apim/policies/resources)
22+
23+
After the resource is created, the policy must be configured with the corresponding name using the **AI Model Resource Name** property.
24+
25+
>**_NOTE_**: The policy will load the model while handling the first request made to the API. Therefore, this first call will take longer than usual, as it includes the model loading time. Subsequent requests will be processed faster.
26+
27+
28+
## Notice
29+
30+
This plugin allows usage of models based on meta LLama4:
31+
32+
* [gravitee-io/Llama-Prompt-Guard-2-22M-onxx](https://huggingface.co/gravitee-io/Llama-Prompt-Guard-2-22M-onnx)
33+
* [gravitee-io/Llama-Prompt-Guard-2-86M-onxx](https://huggingface.co/gravitee-io/Llama-Prompt-Guard-2-86M-onnx)
34+
35+
> Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

CHANGELOG.md

Whitespace-only changes.

NOTICE.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

README.md

Lines changed: 229 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,229 @@
1-
# gravitee-policy-ai-prompt-guard-rails
1+
2+
<!-- GENERATED CODE - DO NOT ALTER THIS OR THE FOLLOWING LINES -->
3+
# AI - Prompt Guard Rails
4+
5+
[![Gravitee.io](https://img.shields.io/static/v1?label=Available%20at&message=Gravitee.io&color=1EC9D2)](https://download.gravitee.io/#graviteeio-apim/plugins/policies/gravitee-policy-ai-prompt-guard-rails/)
6+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/gravitee-io/gravitee-policy-ai-prompt-guard-rails/blob/master/LICENSE.txt)
7+
[![Releases](https://img.shields.io/badge/semantic--release-conventional%20commits-e10079?logo=semantic-release)](https://github.com/gravitee-io/gravitee-policy-ai-prompt-guard-rails/releases)
8+
[![CircleCI](https://circleci.com/gh/gravitee-io/gravitee-policy-ai-prompt-guard-rails.svg?style=svg)](https://circleci.com/gh/gravitee-io/gravitee-policy-ai-prompt-guard-rails)
9+
10+
## Overview
11+
This policy uses an AI-powered text classification model to evaluate user prompts for potentially inappropriate or malicious content. It can detect a wide range of violations, such as profanity, sexually explicit language, harmful intent, and jailbreak prompt injections, which are adversarial inputs crafted to bypass AI safety mechanisms.
12+
13+
Depending on configuration, when a prompt is flagged:
14+
15+
* **Blocked and flagged** – the request is denied at the gateway
16+
* **Allowed but flagged** – the request proceeds but is logged for monitoring
17+
18+
>**_NOTE_**: You may face an error when using this policy using the Gravitee's docker image. This is due to the fact that the default image are based on Alpine Linux, which does not support the ONNX Runtime. To resolve this issue you need to use the Gravitee's docker image based on Debian, which is available at `graviteeio/apim-gateway:4.8.0-debian`.
19+
20+
## Content Checks
21+
22+
The Content Checks property specifies the classification labels that are applied to evaluate prompts. You should choose Labels in alignment with the selected model's capabilities and the intended filtering goals. For example, filtering for profanity while omitting toxicity checks.
23+
24+
Supported labels are documented in the model’s card or configuration file.
25+
26+
27+
## AI Model Resource
28+
29+
The policy requires an **AI Model Text Classification Resource** to be defined at the API level. This resource serves as the classification engine for evaluating prompts' content during policy execution.
30+
31+
For more information about creating and managing resources, go to [Resources](https://documentation.gravitee.io/apim/policies/resources)
32+
33+
After the resource is created, the policy must be configured with the corresponding name using the **AI Model Resource Name** property.
34+
35+
>**_NOTE_**: The policy will load the model while handling the first request made to the API. Therefore, this first call will take longer than usual, as it includes the model loading time. Subsequent requests will be processed faster.
36+
37+
38+
## Notice
39+
40+
This plugin allows usage of models based on meta LLama4:
41+
42+
* [gravitee-io/Llama-Prompt-Guard-2-22M-onxx](https://huggingface.co/gravitee-io/Llama-Prompt-Guard-2-22M-onnx)
43+
* [gravitee-io/Llama-Prompt-Guard-2-86M-onxx](https://huggingface.co/gravitee-io/Llama-Prompt-Guard-2-86M-onnx)
44+
45+
> Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
46+
47+
48+
49+
50+
## Phases
51+
The `ai-prompt-guard-rails` policy can be applied to the following API types and flow phases.
52+
53+
### Compatible API types
54+
55+
* `PROXY`
56+
57+
### Supported flow phases:
58+
59+
* Request
60+
61+
## Compatibility matrix
62+
Strikethrough text indicates that a version is deprecated.
63+
64+
| Plugin version| APIM| Java version |
65+
| --- | --- | --- |
66+
|1.0.0 and after|4.8.x and after|21 |
67+
68+
69+
## Configuration options
70+
71+
72+
####
73+
| Name <br>`json name` | Type <br>`constraint` | Mandatory | Default | Description |
74+
|:----------------------|:-----------------------|:----------:|:---------|:-------------|
75+
| Content Checks<br>`contentChecks`| string| | | Comma-separated list of model labels (e.g., TOXIC,OBSCENE)|
76+
| Prompt Location<br>`promptLocation`| string| | | Prompt Location|
77+
| Request Policy<br>`requestPolicy`| enum (string)| | `LOG_REQUEST`| Request Policy<br>Values: `BLOCK_REQUEST` `LOG_REQUEST`|
78+
| Resource Name<br>`resourceName`| string| | | The resource name loading the Text Classification model|
79+
| Sensitivity threshold<br>`sensitivityThreshold`| number| | `0.5`| |
80+
81+
82+
83+
84+
## Examples
85+
86+
*Only log the request when inappropriate prompt detected*
87+
```json
88+
{
89+
"api": {
90+
"definitionVersion": "V4",
91+
"type": "PROXY",
92+
"name": "AI - Prompt Guard Rails example API",
93+
"resources": [
94+
{
95+
"name": "ai-model-text-classification-resource",
96+
"type": "ai-model-text-classification",
97+
"configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
98+
"enabled": true
99+
}
100+
],
101+
"flows": [
102+
{
103+
"name": "Common Flow",
104+
"enabled": true,
105+
"selectors": [
106+
{
107+
"type": "HTTP",
108+
"path": "/",
109+
"pathOperator": "STARTS_WITH"
110+
}
111+
],
112+
"request": [
113+
{
114+
"name": "AI - Prompt Guard Rails",
115+
"enabled": true,
116+
"policy": "ai-prompt-guard-rails",
117+
"configuration":
118+
{
119+
"resourceName": "ai-model-text-classification-resource",
120+
"promptLocation": "{#request.jsonContent.prompt}",
121+
"contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
122+
"requestPolicy": "LOG_REQUEST"
123+
}
124+
}
125+
]
126+
}
127+
]
128+
}
129+
}
130+
131+
```
132+
*Block request when inappropriate prompt detected*
133+
```json
134+
{
135+
"api": {
136+
"definitionVersion": "V4",
137+
"type": "PROXY",
138+
"name": "AI - Prompt Guard Rails example API",
139+
"resources": [
140+
{
141+
"name": "ai-model-text-classification-resource",
142+
"type": "ai-model-text-classification",
143+
"configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
144+
"enabled": true
145+
}
146+
],
147+
"flows": [
148+
{
149+
"name": "Common Flow",
150+
"enabled": true,
151+
"selectors": [
152+
{
153+
"type": "HTTP",
154+
"path": "/",
155+
"pathOperator": "STARTS_WITH"
156+
}
157+
],
158+
"request": [
159+
{
160+
"name": "AI - Prompt Guard Rails",
161+
"enabled": true,
162+
"policy": "ai-prompt-guard-rails",
163+
"configuration":
164+
{
165+
"resourceName": "ai-model-text-classification-resource",
166+
"promptLocation": "{#request.jsonContent.prompt}",
167+
"contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
168+
"requestPolicy": "BLOCK_REQUEST"
169+
}
170+
}
171+
]
172+
}
173+
]
174+
}
175+
}
176+
177+
```
178+
*Provide a custom sensitivity threshold for inappropriate prompts*
179+
```json
180+
{
181+
"api": {
182+
"definitionVersion": "V4",
183+
"type": "PROXY",
184+
"name": "AI - Prompt Guard Rails example API",
185+
"resources": [
186+
{
187+
"name": "ai-model-text-classification-resource",
188+
"type": "ai-model-text-classification",
189+
"configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
190+
"enabled": true
191+
}
192+
],
193+
"flows": [
194+
{
195+
"name": "Common Flow",
196+
"enabled": true,
197+
"selectors": [
198+
{
199+
"type": "HTTP",
200+
"path": "/",
201+
"pathOperator": "STARTS_WITH"
202+
}
203+
],
204+
"request": [
205+
{
206+
"name": "AI - Prompt Guard Rails",
207+
"enabled": true,
208+
"policy": "ai-prompt-guard-rails",
209+
"configuration":
210+
{
211+
"resourceName": "ai-model-text-classification-resource",
212+
"promptLocation": "{#request.jsonContent.prompt}",
213+
"sensitivityThreshold": 0.1,
214+
"contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
215+
"requestPolicy": "BLOCK_REQUEST"
216+
}
217+
}
218+
]
219+
}
220+
]
221+
}
222+
}
223+
224+
```
225+
226+
227+
## Changelog
228+
229+

0 commit comments

Comments
 (0)