-
Notifications
You must be signed in to change notification settings - Fork 202
blog: add article on APISIX 3.16 dynamic rate limiting #2028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Baoyuantop
merged 6 commits into
apache:master
from
moonming:blog/apisix-3.16-dynamic-rate-limiting
Apr 16, 2026
+332
−0
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
b2338b0
blog: add article on APISIX 3.16 dynamic rate limiting features
moonming 54c4348
Apply suggestion from @Baoyuantop
Baoyuantop 4a27029
fix: add prerequisites and correct constant key in rate limiting exam…
Yilialinn ef747f4
fix: weave prerequisites naturally into blog prose
Yilialinn 2202e5a
Update blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md
Yilialinn b79e7e3
fix catalog
Yilialinn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
332 changes: 332 additions & 0 deletions
332
blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,332 @@ | ||
| --- | ||
| title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" | ||
| authors: | ||
| - name: "Ming Wen" | ||
| title: "Author" | ||
| url: "https://github.com/moonming" | ||
| image_url: "https://github.com/moonming.png" | ||
| keywords: | ||
| - Apache APISIX | ||
| - API Gateway | ||
| - Rate Limiting | ||
| - Dynamic Rate Limiting | ||
| - AI Gateway | ||
| - Multi-Tenant | ||
| - Token Budget | ||
| description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. | ||
| tags: [Community] | ||
| --- | ||
|
|
||
| Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. | ||
|
|
||
| In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. | ||
|
|
||
| Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. | ||
|
|
||
| <!--truncate--> | ||
|
|
||
| ## What Changed in APISIX 3.16 | ||
|
|
||
| APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: | ||
|
|
||
| | Feature | Description | Supported Plugins | | ||
| |---------|-------------|-------------------| | ||
| | Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | | ||
| | Variable support | Use APISIX variables (`$remote_addr`, `$http_*`, `$consumer_name`, etc.) in `key` and plugin-specific rate or threshold fields | `limit-count`, `limit-conn`, `ai-rate-limiting` | | ||
|
|
||
| Both features are fully backward compatible. Existing configurations continue to work without modification. | ||
|
|
||
| ## Multiple Rules: Beyond Single-Threshold Rate Limiting | ||
|
|
||
| ### The Problem | ||
|
|
||
| Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. | ||
|
|
||
| ### The Solution | ||
|
|
||
| The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. | ||
|
|
||
| ```json | ||
| { | ||
| "uri": "/api/v1/*", | ||
| "plugins": { | ||
| "limit-count": { | ||
| "rules": [ | ||
| { | ||
| "count": 10, | ||
| "time_window": 1, | ||
| "key": "${remote_addr}_per_second", | ||
| "header_prefix": "per-second" | ||
| }, | ||
| { | ||
| "count": 500, | ||
| "time_window": 60, | ||
| "key": "${remote_addr}_per_minute", | ||
| "header_prefix": "per-minute" | ||
| }, | ||
| { | ||
| "count": 10000, | ||
| "time_window": 86400, | ||
| "key": "${remote_addr}_per_day", | ||
| "header_prefix": "per-day" | ||
| } | ||
| ], | ||
| "rejected_code": 429 | ||
| } | ||
| }, | ||
| "upstream": { | ||
| "type": "roundrobin", | ||
| "nodes": { | ||
| "127.0.0.1:1980": 1 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: | ||
|
|
||
| ``` | ||
| X-Per-Second-RateLimit-Limit: 10 | ||
| X-Per-Second-RateLimit-Remaining: 0 | ||
| X-Per-Second-RateLimit-Reset: 1 | ||
| X-Per-Minute-RateLimit-Limit: 500 | ||
| X-Per-Minute-RateLimit-Remaining: 499 | ||
| X-Per-Minute-RateLimit-Reset: 60 | ||
| ``` | ||
|
|
||
| The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. | ||
|
|
||
| ## Variable Support: Context-Aware Rate Limiting | ||
|
|
||
| ### The Problem | ||
|
|
||
| Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. | ||
|
|
||
| ### The Solution | ||
|
|
||
| Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. | ||
|
|
||
| ### Example 1: Per-Tier Rate Limiting via HTTP Header | ||
|
|
||
| Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier. Pair `limit-count` with an auth plugin such as `key-auth` so that `${consumer_name}` is available as the rate limit key: | ||
|
|
||
| ```json | ||
| { | ||
| "uri": "/api/v1/*", | ||
| "plugins": { | ||
| "key-auth": {}, | ||
| "limit-count": { | ||
| "rules": [ | ||
| { | ||
| "count": "${http_x_rate_quota ?? 100}", | ||
| "time_window": 60, | ||
| "key": "${consumer_name}" | ||
| } | ||
|
Yilialinn marked this conversation as resolved.
|
||
| ], | ||
| "rejected_code": 429 | ||
| } | ||
| }, | ||
| "upstream": { | ||
|
Yilialinn marked this conversation as resolved.
|
||
| "type": "roundrobin", | ||
| "nodes": { | ||
| "127.0.0.1:1980": 1 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Now the same route handles all tiers: | ||
|
|
||
| | Tier | `X-Rate-Quota` Header | Effective Limit | | ||
| |------|----------------------|-----------------| | ||
| | Free | 100 | 100 req/min | | ||
| | Pro | 1000 | 1,000 req/min | | ||
| | Enterprise | 50000 | 50,000 req/min | | ||
|
|
||
| One route. One plugin configuration. All tiers. | ||
|
|
||
| ### Example 2: Multi-Tenant Isolation with Variable Combination | ||
|
|
||
| For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: | ||
|
|
||
| ```json | ||
| { | ||
| "uri": "/api/v1/*", | ||
| "plugins": { | ||
| "limit-count": { | ||
| "rules": [ | ||
| { | ||
| "count": 1000, | ||
| "time_window": 60, | ||
| "key": "${http_x_tenant_id} ${uri}" | ||
| } | ||
| ], | ||
| "rejected_code": 429 | ||
| } | ||
| }, | ||
| "upstream": { | ||
| "type": "roundrobin", | ||
| "nodes": { | ||
| "127.0.0.1:1980": 1 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. | ||
|
|
||
| ### Example 3: Dynamic Concurrent Connection Limits | ||
|
|
||
| The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control. The example below uses `key-auth` so each consumer gets its own connection quota, while a shared cap applies across all consumers using `${http_host ?? global}` as the shared key: | ||
|
|
||
| ```json | ||
| { | ||
| "uri": "/api/v1/inference", | ||
| "plugins": { | ||
| "key-auth": {}, | ||
| "limit-conn": { | ||
| "default_conn_delay": 0.1, | ||
| "rules": [ | ||
| { | ||
| "conn": 5, | ||
| "burst": 2, | ||
| "key": "${consumer_name}" | ||
| }, | ||
| { | ||
| "conn": 100, | ||
| "burst": 20, | ||
| "key": "${http_host ?? global}" | ||
| } | ||
|
Yilialinn marked this conversation as resolved.
|
||
| ], | ||
| "rejected_code": 503 | ||
| } | ||
| }, | ||
| "upstream": { | ||
|
Yilialinn marked this conversation as resolved.
|
||
| "type": "roundrobin", | ||
| "nodes": { | ||
| "127.0.0.1:1980": 1 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity. | ||
|
|
||
| ## AI Rate Limiting: Token Budget Management | ||
|
|
||
| For AI gateway use cases, the `ai-rate-limiting` plugin works alongside `ai-proxy` to enforce token budgets at the gateway level. It combines multiple rules with variable support for fine-grained control: | ||
|
|
||
| ```json | ||
| { | ||
| "uri": "/v1/chat/completions", | ||
| "plugins": { | ||
| "ai-rate-limiting": { | ||
| "limit_strategy": "total_tokens", | ||
| "rules": [ | ||
| { | ||
| "count": 10000, | ||
| "time_window": 60, | ||
| "key": "${consumer_name}_per_minute", | ||
|
Yilialinn marked this conversation as resolved.
|
||
| "header_prefix": "consumer" | ||
| }, | ||
| { | ||
| "count": 500000, | ||
| "time_window": 86400, | ||
| "key": "${consumer_name}_per_day", | ||
| "header_prefix": "daily" | ||
| }, | ||
| { | ||
| "count": 1000000, | ||
| "time_window": 60, | ||
| "key": "${http_host ?? global}", | ||
| "header_prefix": "global" | ||
| } | ||
| ], | ||
| "rejected_code": 429 | ||
| } | ||
| }, | ||
|
Yilialinn marked this conversation as resolved.
|
||
| "upstream": { | ||
| "type": "roundrobin", | ||
| "nodes": { | ||
| "127.0.0.1:1980": 1 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| This configuration enforces three simultaneous constraints: | ||
|
|
||
| 1. **Per-consumer burst**: 10,000 tokens per minute per consumer | ||
|
Yilialinn marked this conversation as resolved.
|
||
| 2. **Per-consumer daily**: 500,000 tokens per day per consumer | ||
| 3. **Global capacity**: 1,000,000 tokens per minute across all consumers | ||
|
|
||
| As AI API costs scale directly with token usage, this kind of layered budget control is essential for production AI gateways. | ||
|
|
||
| ## Combining Multiple Rules with Variables | ||
|
|
||
| The real power emerges when you combine both features. Here is a complete example for an API platform with tiered pricing. It uses `key-auth` to identify consumers, reads per-consumer quotas from request headers, and maintains a shared global safety cap via `${http_host ?? global}`: | ||
|
|
||
| ```json | ||
| { | ||
| "uri": "/api/v1/*", | ||
| "plugins": { | ||
| "key-auth": {}, | ||
| "limit-count": { | ||
| "rules": [ | ||
| { | ||
| "count": "${http_x_burst_quota ?? 10}", | ||
| "time_window": 1, | ||
| "key": "${consumer_name}_per_second", | ||
| "header_prefix": "burst" | ||
| }, | ||
| { | ||
| "count": "${http_x_sustained_quota ?? 500}", | ||
| "time_window": 60, | ||
| "key": "${consumer_name}_per_minute", | ||
|
Yilialinn marked this conversation as resolved.
|
||
| "header_prefix": "sustained" | ||
| }, | ||
| { | ||
| "count": 100000, | ||
| "time_window": 60, | ||
| "key": "${http_host ?? global}", | ||
| "header_prefix": "global" | ||
| } | ||
| ], | ||
| "rejected_code": 429 | ||
| } | ||
| }, | ||
|
Yilialinn marked this conversation as resolved.
|
||
| "upstream": { | ||
| "type": "roundrobin", | ||
| "nodes": { | ||
| "127.0.0.1:1980": 1 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The authentication layer sets per-consumer burst and sustained quotas via headers. APISIX enforces both per-consumer limits dynamically while maintaining a static global safety cap. No route duplication. No configuration drift between tiers. | ||
|
|
||
| ## What's Next | ||
|
|
||
| The `limit-req` plugin (leaky bucket algorithm) does not yet support the `rules` array ([#13179](https://github.com/apache/apisix/issues/13179)). We welcome community contributions to bring it to feature parity. | ||
|
|
||
| We are also exploring deeper integration with external policy engines, enabling rate limiting quotas to be fetched from external key-value stores or policy services at runtime. | ||
|
|
||
| ## Getting Started | ||
|
|
||
| Upgrade to APISIX 3.16: | ||
|
|
||
| ```bash | ||
| # Docker | ||
| docker pull apache/apisix:3.16.0 | ||
|
|
||
| # Helm | ||
| helm repo update | ||
| helm upgrade apisix apisix/apisix --set image.tag=3.16.0 | ||
| ``` | ||
|
|
||
| Check the full documentation: | ||
|
Baoyuantop marked this conversation as resolved.
|
||
|
|
||
| - [limit-count plugin](https://apisix.apache.org/docs/apisix/plugins/limit-count/) | ||
| - [limit-conn plugin](https://apisix.apache.org/docs/apisix/plugins/limit-conn/) | ||
| - [ai-rate-limiting plugin](https://apisix.apache.org/docs/apisix/plugins/ai-rate-limiting/) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.