Skip to content

Commit a3287b5

Browse files
add docs
1 parent e6acb25 commit a3287b5

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

docs/en/latest/plugins/ai-rate-limiting.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,17 +35,18 @@ The `ai-rate-limiting` plugin enforces token-based rate limiting for requests se
3535

3636
| Name | Type | Required | Description |
3737
| ------------------------- | ------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
38-
| `limit` | integer | false | The maximum number of tokens allowed to consume within a given time interval. At least one of `limit` and `instances.limit` should be configured. |
39-
| `time_window` | integer | false | The time interval corresponding to the rate limiting `limit` in seconds. At least one of `time_window` and `instances.time_window` should be configured. |
38+
| `limit` | integer | conditionally | The maximum number of tokens allowed to consume within a given time interval. At least one of `limit` and `instances.limit` should be configured. |
39+
| `time_window` | integer | conditionally | The time interval corresponding to the rate limiting `limit` in seconds. At least one of `time_window` and `instances.time_window` should be configured. |
4040
| `show_limit_quota_header` | boolean | false | If true, include `X-AI-RateLimit-Limit-*` to show the total quota, `X-AI-RateLimit-Remaining-*` to show the remaining quota in the response header, and `X-AI-RateLimit-Reset-*` to show the number of seconds left for the counter to reset, where `*` is the instance name. Default: `true` |
4141
| `limit_strategy` | string | false | Type of token to apply rate limiting. `total_tokens`, `prompt_tokens`, and `completion_tokens` values are returned in each model response, where `total_tokens` is the sum of `prompt_tokens` and `completion_tokens`. Default: `total_tokens` |
42-
| `instances` | array[object] | false | LLM instance rate limiting configurations. |
42+
| `instances` | array[object] | conditionally | LLM instance rate limiting configurations. |
4343
| `instances.name` | string | true | Name of the LLM service instance. |
4444
| `instances.limit` | integer | true | The maximum number of tokens allowed to consume within a given time interval. |
4545
| `instances.time_window` | integer | true | The time interval corresponding to the rate limiting `limit` in seconds. |
4646
| `rejected_code` | integer | false | The HTTP status code returned when a request exceeding the quota is rejected. Default: `503` |
4747
| `rejected_msg` | string | false | The response body returned when a request exceeding the quota is rejected. |
4848

49+
If `limit` is configured, `time_window` also needs to be configured. Else, just specifying `instances` will also suffice.
4950
## Example
5051

5152
Create a route as such and update with your LLM providers, models, API keys, and endpoints:

0 commit comments

Comments
 (0)