Skip to content

Commit 2769ec3

Browse files
authored
blog: add article on APISIX 3.16 dynamic rate limiting (#2028)
1 parent d21398b commit 2769ec3

File tree

1 file changed

+332
-0
lines changed

1 file changed

+332
-0
lines changed
Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
---
2+
title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway"
3+
authors:
4+
- name: "Ming Wen"
5+
title: "Author"
6+
url: "https://github.com/moonming"
7+
image_url: "https://github.com/moonming.png"
8+
keywords:
9+
- Apache APISIX
10+
- API Gateway
11+
- Rate Limiting
12+
- Dynamic Rate Limiting
13+
- AI Gateway
14+
- Multi-Tenant
15+
- Token Budget
16+
description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration.
17+
tags: [Community]
18+
---
19+
20+
Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done.
21+
22+
In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes.
23+
24+
Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine.
25+
26+
<!--truncate-->
27+
28+
## What Changed in APISIX 3.16
29+
30+
APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins:
31+
32+
| Feature | Description | Supported Plugins |
33+
|---------|-------------|-------------------|
34+
| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` |
35+
| Variable support | Use APISIX variables (`$remote_addr`, `$http_*`, `$consumer_name`, etc.) in `key` and plugin-specific rate or threshold fields | `limit-count`, `limit-conn`, `ai-rate-limiting` |
36+
37+
Both features are fully backward compatible. Existing configurations continue to work without modification.
38+
39+
## Multiple Rules: Beyond Single-Threshold Rate Limiting
40+
41+
### The Problem
42+
43+
Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain.
44+
45+
### The Solution
46+
47+
The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key.
48+
49+
```json
50+
{
51+
"uri": "/api/v1/*",
52+
"plugins": {
53+
"limit-count": {
54+
"rules": [
55+
{
56+
"count": 10,
57+
"time_window": 1,
58+
"key": "${remote_addr}_per_second",
59+
"header_prefix": "per-second"
60+
},
61+
{
62+
"count": 500,
63+
"time_window": 60,
64+
"key": "${remote_addr}_per_minute",
65+
"header_prefix": "per-minute"
66+
},
67+
{
68+
"count": 10000,
69+
"time_window": 86400,
70+
"key": "${remote_addr}_per_day",
71+
"header_prefix": "per-day"
72+
}
73+
],
74+
"rejected_code": 429
75+
}
76+
},
77+
"upstream": {
78+
"type": "roundrobin",
79+
"nodes": {
80+
"127.0.0.1:1980": 1
81+
}
82+
}
83+
}
84+
```
85+
86+
With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded:
87+
88+
```
89+
X-Per-Second-RateLimit-Limit: 10
90+
X-Per-Second-RateLimit-Remaining: 0
91+
X-Per-Second-RateLimit-Reset: 1
92+
X-Per-Minute-RateLimit-Limit: 500
93+
X-Per-Minute-RateLimit-Remaining: 499
94+
X-Per-Minute-RateLimit-Reset: 60
95+
```
96+
97+
The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic.
98+
99+
## Variable Support: Context-Aware Rate Limiting
100+
101+
### The Problem
102+
103+
Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift.
104+
105+
### The Solution
106+
107+
Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables.
108+
109+
### Example 1: Per-Tier Rate Limiting via HTTP Header
110+
111+
Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier. Pair `limit-count` with an auth plugin such as `key-auth` so that `${consumer_name}` is available as the rate limit key:
112+
113+
```json
114+
{
115+
"uri": "/api/v1/*",
116+
"plugins": {
117+
"key-auth": {},
118+
"limit-count": {
119+
"rules": [
120+
{
121+
"count": "${http_x_rate_quota ?? 100}",
122+
"time_window": 60,
123+
"key": "${consumer_name}"
124+
}
125+
],
126+
"rejected_code": 429
127+
}
128+
},
129+
"upstream": {
130+
"type": "roundrobin",
131+
"nodes": {
132+
"127.0.0.1:1980": 1
133+
}
134+
}
135+
}
136+
```
137+
138+
Now the same route handles all tiers:
139+
140+
| Tier | `X-Rate-Quota` Header | Effective Limit |
141+
|------|----------------------|-----------------|
142+
| Free | 100 | 100 req/min |
143+
| Pro | 1000 | 1,000 req/min |
144+
| Enterprise | 50000 | 50,000 req/min |
145+
146+
One route. One plugin configuration. All tiers.
147+
148+
### Example 2: Multi-Tenant Isolation with Variable Combination
149+
150+
For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint:
151+
152+
```json
153+
{
154+
"uri": "/api/v1/*",
155+
"plugins": {
156+
"limit-count": {
157+
"rules": [
158+
{
159+
"count": 1000,
160+
"time_window": 60,
161+
"key": "${http_x_tenant_id} ${uri}"
162+
}
163+
],
164+
"rejected_code": 429
165+
}
166+
},
167+
"upstream": {
168+
"type": "roundrobin",
169+
"nodes": {
170+
"127.0.0.1:1980": 1
171+
}
172+
}
173+
}
174+
```
175+
176+
Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication.
177+
178+
### Example 3: Dynamic Concurrent Connection Limits
179+
180+
The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control. The example below uses `key-auth` so each consumer gets its own connection quota, while a shared cap applies across all consumers using `${http_host ?? global}` as the shared key:
181+
182+
```json
183+
{
184+
"uri": "/api/v1/inference",
185+
"plugins": {
186+
"key-auth": {},
187+
"limit-conn": {
188+
"default_conn_delay": 0.1,
189+
"rules": [
190+
{
191+
"conn": 5,
192+
"burst": 2,
193+
"key": "${consumer_name}"
194+
},
195+
{
196+
"conn": 100,
197+
"burst": 20,
198+
"key": "${http_host ?? global}"
199+
}
200+
],
201+
"rejected_code": 503
202+
}
203+
},
204+
"upstream": {
205+
"type": "roundrobin",
206+
"nodes": {
207+
"127.0.0.1:1980": 1
208+
}
209+
}
210+
}
211+
```
212+
213+
This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity.
214+
215+
## AI Rate Limiting: Token Budget Management
216+
217+
For AI gateway use cases, the `ai-rate-limiting` plugin works alongside `ai-proxy` to enforce token budgets at the gateway level. It combines multiple rules with variable support for fine-grained control:
218+
219+
```json
220+
{
221+
"uri": "/v1/chat/completions",
222+
"plugins": {
223+
"ai-rate-limiting": {
224+
"limit_strategy": "total_tokens",
225+
"rules": [
226+
{
227+
"count": 10000,
228+
"time_window": 60,
229+
"key": "${consumer_name}_per_minute",
230+
"header_prefix": "consumer"
231+
},
232+
{
233+
"count": 500000,
234+
"time_window": 86400,
235+
"key": "${consumer_name}_per_day",
236+
"header_prefix": "daily"
237+
},
238+
{
239+
"count": 1000000,
240+
"time_window": 60,
241+
"key": "${http_host ?? global}",
242+
"header_prefix": "global"
243+
}
244+
],
245+
"rejected_code": 429
246+
}
247+
},
248+
"upstream": {
249+
"type": "roundrobin",
250+
"nodes": {
251+
"127.0.0.1:1980": 1
252+
}
253+
}
254+
}
255+
```
256+
257+
This configuration enforces three simultaneous constraints:
258+
259+
1. **Per-consumer burst**: 10,000 tokens per minute per consumer
260+
2. **Per-consumer daily**: 500,000 tokens per day per consumer
261+
3. **Global capacity**: 1,000,000 tokens per minute across all consumers
262+
263+
As AI API costs scale directly with token usage, this kind of layered budget control is essential for production AI gateways.
264+
265+
## Combining Multiple Rules with Variables
266+
267+
The real power emerges when you combine both features. Here is a complete example for an API platform with tiered pricing. It uses `key-auth` to identify consumers, reads per-consumer quotas from request headers, and maintains a shared global safety cap via `${http_host ?? global}`:
268+
269+
```json
270+
{
271+
"uri": "/api/v1/*",
272+
"plugins": {
273+
"key-auth": {},
274+
"limit-count": {
275+
"rules": [
276+
{
277+
"count": "${http_x_burst_quota ?? 10}",
278+
"time_window": 1,
279+
"key": "${consumer_name}_per_second",
280+
"header_prefix": "burst"
281+
},
282+
{
283+
"count": "${http_x_sustained_quota ?? 500}",
284+
"time_window": 60,
285+
"key": "${consumer_name}_per_minute",
286+
"header_prefix": "sustained"
287+
},
288+
{
289+
"count": 100000,
290+
"time_window": 60,
291+
"key": "${http_host ?? global}",
292+
"header_prefix": "global"
293+
}
294+
],
295+
"rejected_code": 429
296+
}
297+
},
298+
"upstream": {
299+
"type": "roundrobin",
300+
"nodes": {
301+
"127.0.0.1:1980": 1
302+
}
303+
}
304+
}
305+
```
306+
307+
The authentication layer sets per-consumer burst and sustained quotas via headers. APISIX enforces both per-consumer limits dynamically while maintaining a static global safety cap. No route duplication. No configuration drift between tiers.
308+
309+
## What's Next
310+
311+
The `limit-req` plugin (leaky bucket algorithm) does not yet support the `rules` array ([#13179](https://github.com/apache/apisix/issues/13179)). We welcome community contributions to bring it to feature parity.
312+
313+
We are also exploring deeper integration with external policy engines, enabling rate limiting quotas to be fetched from external key-value stores or policy services at runtime.
314+
315+
## Getting Started
316+
317+
Upgrade to APISIX 3.16:
318+
319+
```bash
320+
# Docker
321+
docker pull apache/apisix:3.16.0
322+
323+
# Helm
324+
helm repo update
325+
helm upgrade apisix apisix/apisix --set image.tag=3.16.0
326+
```
327+
328+
Check the full documentation:
329+
330+
- [limit-count plugin](https://apisix.apache.org/docs/apisix/plugins/limit-count/)
331+
- [limit-conn plugin](https://apisix.apache.org/docs/apisix/plugins/limit-conn/)
332+
- [ai-rate-limiting plugin](https://apisix.apache.org/docs/apisix/plugins/ai-rate-limiting/)

0 commit comments

Comments
 (0)