You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`limit`| integer |false| The maximum number of tokens allowed to consume within a given time interval. At least one of `limit` and `instances.limit` should be configured. |
39
-
|`time_window`| integer |false| The time interval corresponding to the rate limiting `limit` in seconds. At least one of `time_window` and `instances.time_window` should be configured. |
38
+
|`limit`| integer |conditionally| The maximum number of tokens allowed to consume within a given time interval. At least one of `limit` and `instances.limit` should be configured. |
39
+
|`time_window`| integer |conditionally| The time interval corresponding to the rate limiting `limit` in seconds. At least one of `time_window` and `instances.time_window` should be configured. |
40
40
|`show_limit_quota_header`| boolean | false | If true, include `X-AI-RateLimit-Limit-*` to show the total quota, `X-AI-RateLimit-Remaining-*` to show the remaining quota in the response header, and `X-AI-RateLimit-Reset-*` to show the number of seconds left for the counter to reset, where `*` is the instance name. Default: `true`|
41
41
|`limit_strategy`| string | false | Type of token to apply rate limiting. `total_tokens`, `prompt_tokens`, and `completion_tokens` values are returned in each model response, where `total_tokens` is the sum of `prompt_tokens` and `completion_tokens`. Default: `total_tokens`|
0 commit comments