diff --git a/blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md b/blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md new file mode 100644 index 0000000000000..447c9985da268 --- /dev/null +++ b/blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md @@ -0,0 +1,332 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Community] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + + + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`$remote_addr`, `$http_*`, `$consumer_name`, etc.) in `key` and plugin-specific rate or threshold fields | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier. Pair `limit-count` with an auth plugin such as `key-auth` so that `${consumer_name}` is available as the rate limit key: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "key-auth": {}, + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control. The example below uses `key-auth` so each consumer gets its own connection quota, while a shared cap applies across all consumers using `${http_host ?? global}` as the shared key: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "key-auth": {}, + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "${http_host ?? global}" + } + ], + "rejected_code": 503 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity. + +## AI Rate Limiting: Token Budget Management + +For AI gateway use cases, the `ai-rate-limiting` plugin works alongside `ai-proxy` to enforce token budgets at the gateway level. It combines multiple rules with variable support for fine-grained control: + +```json +{ + "uri": "/v1/chat/completions", + "plugins": { + "ai-rate-limiting": { + "limit_strategy": "total_tokens", + "rules": [ + { + "count": 10000, + "time_window": 60, + "key": "${consumer_name}_per_minute", + "header_prefix": "consumer" + }, + { + "count": 500000, + "time_window": 86400, + "key": "${consumer_name}_per_day", + "header_prefix": "daily" + }, + { + "count": 1000000, + "time_window": 60, + "key": "${http_host ?? global}", + "header_prefix": "global" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This configuration enforces three simultaneous constraints: + +1. **Per-consumer burst**: 10,000 tokens per minute per consumer +2. **Per-consumer daily**: 500,000 tokens per day per consumer +3. **Global capacity**: 1,000,000 tokens per minute across all consumers + +As AI API costs scale directly with token usage, this kind of layered budget control is essential for production AI gateways. + +## Combining Multiple Rules with Variables + +The real power emerges when you combine both features. Here is a complete example for an API platform with tiered pricing. It uses `key-auth` to identify consumers, reads per-consumer quotas from request headers, and maintains a shared global safety cap via `${http_host ?? global}`: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "key-auth": {}, + "limit-count": { + "rules": [ + { + "count": "${http_x_burst_quota ?? 10}", + "time_window": 1, + "key": "${consumer_name}_per_second", + "header_prefix": "burst" + }, + { + "count": "${http_x_sustained_quota ?? 500}", + "time_window": 60, + "key": "${consumer_name}_per_minute", + "header_prefix": "sustained" + }, + { + "count": 100000, + "time_window": 60, + "key": "${http_host ?? global}", + "header_prefix": "global" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +The authentication layer sets per-consumer burst and sustained quotas via headers. APISIX enforces both per-consumer limits dynamically while maintaining a static global safety cap. No route duplication. No configuration drift between tiers. + +## What's Next + +The `limit-req` plugin (leaky bucket algorithm) does not yet support the `rules` array ([#13179](https://github.com/apache/apisix/issues/13179)). We welcome community contributions to bring it to feature parity. + +We are also exploring deeper integration with external policy engines, enabling rate limiting quotas to be fetched from external key-value stores or policy services at runtime. + +## Getting Started + +Upgrade to APISIX 3.16: + +```bash +# Docker +docker pull apache/apisix:3.16.0 + +# Helm +helm repo update +helm upgrade apisix apisix/apisix --set image.tag=3.16.0 +``` + +Check the full documentation: + +- [limit-count plugin](https://apisix.apache.org/docs/apisix/plugins/limit-count/) +- [limit-conn plugin](https://apisix.apache.org/docs/apisix/plugins/limit-conn/) +- [ai-rate-limiting plugin](https://apisix.apache.org/docs/apisix/plugins/ai-rate-limiting/)