Your API goes viral on Hacker News. Traffic spikes 100x. Your database melts. Everything goes down.
A developer writes a buggy script that hammers your endpoint in an infinite loop. One user takes down the whole system.
Your AI costs explode because nothing stops users from making 10,000 requests per minute.
Traffic you can't control will eventually destroy what you've built.
FOUNDATIONAL - Every production API needs limits. The question is whether you set them or discover them during an outage.
Rate limiting counts requests and blocks them when they exceed a threshold. "100 requests per minute" means request 101 gets rejected. The counter resets every minute. Simple concept, massive protection.
You can limit by user, by IP, by API key, by endpoint, or any combination. Different limits for different tiers: free users get 10 requests/minute, paid users get 1,000, enterprise gets 10,000.
Rate limiting is the cheapest insurance policy in software. A few lines of code can save you from outages, runaway costs, and abusive users.
Rate limiting solves a universal problem: how do you protect shared resources from being overwhelmed by any single consumer?
Count requests in a time window. When count exceeds threshold, reject. Reset the count periodically. This pattern works whether you're limiting API calls, login attempts, or AI token usage.
This API allows 10 requests per 10 seconds. Try to exceed the limit and see what happens.
What you just discovered:
Once you hit the limit, every additional request is immediately rejected with 429. The Retry-After header tells you exactly when to try again. No wasted resources processing requests that would fail anyway.
Simple counter that resets on the clock
Count requests per minute (or hour). At :00, the counter resets. Easy to implement, easy to understand. But users can burst 100 requests at :59 and 100 more at :00.
Smoother limits without hard resets
Instead of resetting at :00, the window slides. At 1:30, you're looking at requests from 0:30 to 1:30. No boundary bursts, but slightly more complex to track.
Allow controlled bursts
You have a bucket that fills with tokens at a steady rate. Each request removes a token. Bucket can hold extra tokens for bursts. When empty, requests wait or fail.
Your AI assistant is popular. Too popular. One power user made 5,000 requests yesterday. Your OpenAI bill is now $2,000 for a single user. With this flow, you cap costs per user while keeping the service running for everyone.
Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed
Animated lines show direct connections · Hover for detailsTap for details · Click to learn more
Foundation layer - no upstream dependencies
You put rate limiting on your API gateway. Feels safe. Then an internal service calls another internal service in a loop. No gateway in between. Internal DDoS.
Instead: Rate limit at every layer that matters: gateway, service-to-service, and database queries.
You return 429 Too Many Requests but don't tell clients when to retry. They immediately retry. You're now rate limiting their retries. Infinite frustration loop.
Instead: Always include Retry-After header. Tell clients exactly when they can try again.
You set 100 requests/minute because it felt right. Six months later, legitimate usage patterns have changed. Good users are getting blocked. Bad users found workarounds.
Instead: Monitor your rate limit hits. Alert when good users hit limits. Adjust limits based on real usage.
You've learned how to protect your systems from overload. The natural next step is understanding what happens when limits are hit and how to handle failures gracefully.