Rate Limiting in Go: A Practical Guide

Rate limiting is one of those things that sounds simple but has surprising depth once you start thinking about distributed systems. This post walks through three algorithms and when to use each.

Why rate limiting matters

Without rate limiting, a single misbehaving client — or an attacker — can bring down your API. Even with well-intentioned clients, thundering herd problems happen.

ℹ

Rate limiting is not just a security measure. It's also about fairness and cost control in multi-tenant systems.

The three algorithms

1. Fixed Window

The simplest approach. Count requests in a fixed time window (e.g., 100 requests per minute). Reset the counter at the window boundary.

type FixedWindow struct {
    limit    int
    window   time.Duration
    mu       sync.Mutex
    count    int
    resetAt  time.Time
}

func (fw *FixedWindow) Allow() bool {
    fw.mu.Lock()
    defer fw.mu.Unlock()

    now := time.Now()
    if now.After(fw.resetAt) {
        fw.count = 0
        fw.resetAt = now.Add(fw.window)
    }

    if fw.count >= fw.limit {
        return false
    }
    fw.count++
    return true
}

Problem: Boundary bursting. A client can send 100 requests at 00:59 and 100 more at 01:01 — effectively 200 requests in 2 seconds.

2. Sliding Window

Tracks requests with timestamps instead of a simple counter. Much more accurate.

type SlidingWindow struct {
    limit    int
    window   time.Duration
    mu       sync.Mutex
    requests []time.Time
}

func (sw *SlidingWindow) Allow() bool {
    sw.mu.Lock()
    defer sw.mu.Unlock()

    now := time.Now()
    cutoff := now.Add(-sw.window)

    // Remove expired entries
    valid := sw.requests[:0]
    for _, t := range sw.requests {
        if t.After(cutoff) {
            valid = append(valid, t)
        }
    }
    sw.requests = valid

    if len(sw.requests) >= sw.limit {
        return false
    }
    sw.requests = append(sw.requests, now)
    return true
}

Problem: Memory grows with traffic. For high-volume APIs, use Redis sorted sets instead of in-memory slices.

3. Token Bucket

The most flexible algorithm. Tokens refill at a constant rate; requests consume tokens.

type TokenBucket struct {
    capacity float64
    rate     float64 // tokens per second
    tokens   float64
    lastSeen time.Time
    mu       sync.Mutex
}

func (tb *TokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()
    elapsed := now.Sub(tb.lastSeen).Seconds()
    tb.tokens = min(tb.capacity, tb.tokens+elapsed*tb.rate)
    tb.lastSeen = now

    if tb.tokens < 1 {
        return false
    }
    tb.tokens--
    return true
}

Token bucket is great for APIs where you want to allow short bursts but maintain a long-term average.

Which to use?

Algorithm	Burst Handling	Memory	Accuracy	Distributed
Fixed Window	Poor	O(1)	Low	Easy
Sliding Window	Good	O(n)	High	Harder
Token Bucket	Excellent	O(1)	Medium	Medium

Going distributed with Redis

For multi-instance deployments, in-memory limiters don't work. Use Redis with Lua scripts to make the check-and-increment atomic:

-- Lua script for token bucket in Redis
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_seen')
local tokens = tonumber(bucket[1]) or capacity
local last_seen = tonumber(bucket[2]) or now

local elapsed = now - last_seen
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens < 1 then
    return 0
end

tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_seen', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
return 1

💡

Always use Lua scripts for Redis rate limiting. It makes the read-modify-write atomic without needing WATCH/MULTI/EXEC transactions.

Wrapping it as middleware

func RateLimitMiddleware(limiter Limiter) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            if !limiter.Allow() {
                w.Header().Set("Retry-After", "60")
                http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
                return
            }
            next.ServeHTTP(w, r)
        })
    }
}

That's the core of what I built in go-ratelimit. The library handles key-per-IP, per-user, and per-route limiting with a clean API.

Questions or corrections? Open an issue on GitHub or reach out via email.