The Latency vs. Consistency Trade-off in Rate Limiting: Why We Need Register-Only Mode
Rate limiting is often visualized as a bouncer at a club: check ID, check capacity, then let them in. In distributed systems, this translates to a synchronous check before processing a request. While intuitive, this "check-then-act" model can be a silent killer for high-performance load balancers.
Today, let's explore a counter-intuitive pattern: the Async Register-Only Mode. We'll examine why sacrificing strict consistency for latency is often the right choice for high-throughput gateways.
The Hidden Cost of Synchronous Checks
In a standard synchronous rate limiter, the flow looks like this:
Request -> Gateway -> RPC to Quota Service -> Gateway -> Backend
The Gateway must wait for the Quota Service to respond. If your Quota Service is in another zone or region, you are adding significant network round-trip time (RTT) to every single request.
- Latency Penalty: A 10ms RTT to the quota service means your API's baseline latency is now 10ms + processing time. For a service aiming for single-digit millisecond responses, this is a non-starter.
- Availability Coupling: If the Quota Service degrades, your Gateway degrades. You've introduced a hard dependency on a control-plane component for data-plane availability.
Enter Register-Only Mode (Async Pipeline)
The "Register-Only" pattern flips the model: Act first, account later.
When a request arrives:
- Process Immediately: The Gateway allows the request to proceed to the backend without waiting.
- Async Registration: In parallel (e.g., via a Goroutine or a buffered channel), the Gateway sends a "usage event" to the Quota Service.
The Quota Service aggregates these events. If a threshold is breached, it pushes a "throttle" signal back to the Gateway. The Gateway then switches to local rejection mode for a short period.
The Trade-off Analysis
This architecture explicitly trades consistency for latency and availability.
- Consistency (Precision): We accept a margin of error. Since we don't block, a sudden burst of traffic might exceed the limit before the async signal loops back to throttle it. We might allow 105 requests in a 100 RPS limit.
- Latency (Performance): We remove the Quota Service RTT from the critical path. The user experience is unaffected by the distance to the rate limiter.
Go Implementation: Sync vs. Async
Let's simulate this trade-off in Go. We'll compare a Strict Mode (Sync) against a Register-Only Mode (Async).
package main
import (
"context"
"fmt"
"sync"
"time"
)
// QuotaClient simulates a remote Quota Service with network latency
type QuotaClient struct {
mu sync.Mutex
}
// QuotaResult represents the response from the service
type QuotaResult struct {
Allowed bool
Latency time.Duration
}
func (c *QuotaClient) Acquire(ctx context.Context, id string) QuotaResult {
start := time.Now()
// Simulate Network IO: 100ms latency
select {
case <-time.After(100 * time.Millisecond):
return QuotaResult{Allowed: true, Latency: time.Since(start)}
case <-ctx.Done():
return QuotaResult{Allowed: false, Latency: time.Since(start)}
}
}
// Balancer simulates our Load Balancer / Gateway
type Balancer struct {
quotaClient *QuotaClient
// Mode toggle: true = Async Register-Only, false = Sync Strict
registerOnly bool
}
func (b *Balancer) HandleRequest(id string) {
if b.registerOnly {
b.handleAsync(id)
} else {
b.handleSync(id)
}
}
// Sync Mode: Block until quota is confirmed
// Downside: Latency spike, hard dependency
func (b *Balancer) handleSync(id string) {
start := time.Now()
// Timeout to prevent hanging indefinitely
ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
defer cancel()
result := b.quotaClient.Acquire(ctx, id)
if result.Allowed {
fmt.Printf("[Sync] Req %s: Quota OK (took %v), Processing...\n", id, time.Since(start))
} else {
fmt.Printf("[Sync] Req %s: Rate Limited or Timeout.\n", id)
}
}
// Async Mode: Fire-and-forget
// Upside: Zero added latency
func (b *Balancer) handleAsync(id string) {
start := time.Now()
// 1. Optimistic Execution: Process immediately
fmt.Printf("[Async] Req %s: Allowed (Non-blocking), Processing...\n", id)
// 2. Async Reporting: Report usage in background
// Note: In prod, use a worker pool/buffer to avoid unbounded goroutines
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()
_ = b.quotaClient.Acquire(ctx, id)
// This report updates the global counter eventually
}()
// The user request completes in microseconds, ignoring the 100ms quota latency
fmt.Printf("[Async] Req %s: Done (Main path latency: %v)\n", id, time.Since(start))
}
func main() {
client := &QuotaClient{}
fmt.Println("--- Scenario A: Sync Strict Mode (Safety First) ---")
balancerSync := &Balancer{quotaClient: client, registerOnly: false}
balancerSync.HandleRequest("REQ-001")
fmt.Println("\n--- Scenario B: Async Register-Only Mode (Performance First) ---")
balancerAsync := &Balancer{quotaClient: client, registerOnly: true}
balancerAsync.HandleRequest("REQ-002")
// Wait for async goroutine to finish for demo purposes
time.Sleep(200 * time.Millisecond)
fmt.Println("\nEnd of Demo.")
}
Results
--- Scenario A: Sync Strict Mode (Safety First) ---
[Sync] Req REQ-001: Quota OK (took 100.12ms), Processing...
--- Scenario B: Async Register-Only Mode (Performance First) ---
[Async] Req REQ-002: Allowed (Non-blocking), Processing...
[Async] Req REQ-002: Done (Main path latency: 45µs)
The difference is stark. Sync mode forces a 100ms penalty on every request. Async mode completes in 45µs—orders of magnitude faster.
Conclusion
There is no "perfect" rate limiter.
- If you are building a Payment Gateway, you likely need Strict Mode. The cost of an accidental over-limit transaction is high (consistency matters most).
- If you are building a High-Traffic API Gateway, blocking user requests for a counter check is often unacceptable. Register-Only Mode is the pragmatic choice: it protects the backend from sustained overload while keeping the happy path strictly low-latency.
Engineering is about choosing the right trade-off for the right problem.