KnowledgeLayer 0Security & Access Control

Rate Limiting

Your API goes viral on Hacker News. Traffic spikes 100x. Your database melts. Everything goes down.

A developer writes a buggy script that hammers your endpoint in an infinite loop. One user takes down the whole system.

Your AI costs explode because nothing stops users from making 10,000 requests per minute.

Traffic you can't control will eventually destroy what you've built.

8 min read

beginner

Relevant If You're

Protecting APIs from abuse or accidental overload

Controlling AI/LLM costs per user or tenant

Ensuring fair usage across all customers

FOUNDATIONAL - Every production API needs limits. The question is whether you set them or discover them during an outage.

Where This Sits

Category 0.3: Security & Access Control

Layer 0

Foundation

Authentication Authorization/Permissions Secrets Management Audit Trails Rate Limiting

Explore all of Layer 0

What It Is

A way to control how many requests can happen in a given time window

Rate limiting counts requests and blocks them when they exceed a threshold. "100 requests per minute" means request 101 gets rejected. The counter resets every minute. Simple concept, massive protection.

You can limit by user, by IP, by API key, by endpoint, or any combination. Different limits for different tiers: free users get 10 requests/minute, paid users get 1,000, enterprise gets 10,000.

Rate limiting is the cheapest insurance policy in software. A few lines of code can save you from outages, runaway costs, and abusive users.

The Lego Block Principle

Rate limiting solves a universal problem: how do you protect shared resources from being overwhelmed by any single consumer?

The core pattern:

Count requests in a time window. When count exceeds threshold, reject. Reset the count periodically. This pattern works whether you're limiting API calls, login attempts, or AI token usage.

Where else this applies:

Traffic lights - Control flow rate to prevent intersection overload.

Amusement park rides - Fixed capacity per run prevents overcrowding.

Bank withdrawal limits - Daily ATM limits protect against fraud.

Email sending - Hourly send limits prevent spam classification.

Try It

See rate limiting in action

This API allows 10 requests per 10 seconds. Try to exceed the limit and see what happens.

Requests Used0 / 10

Request Log

Make a request to see the log

What you just discovered:

Once you hit the limit, every additional request is immediately rejected with 429. The Retry-After header tells you exactly when to try again. No wasted resources processing requests that would fail anyway.

How It Works

Three common rate limiting strategies

Fixed Window

Simple counter that resets on the clock

Count requests per minute (or hour). At :00, the counter resets. Easy to implement, easy to understand. But users can burst 100 requests at :59 and 100 more at :00.

Pro: Simple to implement and reason about

Con: Allows bursts at window boundaries

Sliding Window

Smoother limits without hard resets

Instead of resetting at :00, the window slides. At 1:30, you're looking at requests from 0:30 to 1:30. No boundary bursts, but slightly more complex to track.

Pro: Prevents boundary burst exploits

Con: Requires more storage and computation

Token Bucket

Allow controlled bursts

You have a bucket that fills with tokens at a steady rate. Each request removes a token. Bucket can hold extra tokens for bursts. When empty, requests wait or fail.

Pro: Allows legitimate bursts while enforcing average rate

Con: More complex to configure correctly

Connection Explorer

"Stop that one user from bankrupting our AI budget"

Your AI assistant is popular. Too popular. One power user made 5,000 requests yesterday. Your OpenAI bill is now $2,000 for a single user. With this flow, you cap costs per user while keeping the service running for everyone.

Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed

Predictable Costs

Outcome

React Flow

Foundation

Quality & Reliability

Optimization

Outcome

Animated lines show direct connections · Hover for detailsTap for details · Click to learn more

Upstream (Requires)

Foundation layer - no upstream dependencies

Downstream (Enables)

Retry Strategies Circuit Breakers Cost Attribution

Common Mistakes

What breaks when rate limiting goes wrong

Don't rate limit only at the edge

You put rate limiting on your API gateway. Feels safe. Then an internal service calls another internal service in a loop. No gateway in between. Internal DDoS.

Instead: Rate limit at every layer that matters: gateway, service-to-service, and database queries.

Don't forget to return proper headers

You return 429 Too Many Requests but don't tell clients when to retry. They immediately retry. You're now rate limiting their retries. Infinite frustration loop.

Instead: Always include Retry-After header. Tell clients exactly when they can try again.

Don't set limits without monitoring

You set 100 requests/minute because it felt right. Six months later, legitimate usage patterns have changed. Good users are getting blocked. Bad users found workarounds.

Instead: Monitor your rate limit hits. Alert when good users hit limits. Adjust limits based on real usage.

What's Next

Now that you understand rate limiting

You've learned how to protect your systems from overload. The natural next step is understanding what happens when limits are hit and how to handle failures gracefully.

Recommended Next

Retry Strategies

How to handle failures and rate limit responses gracefully

Back to Learning Hub

Rate Limiting

Your API goes viral on Hacker News. Traffic spikes 100x. Your database melts. Everything goes down.

A developer writes a buggy script that hammers your endpoint in an infinite loop. One user takes down the whole system.

Your AI costs explode because nothing stops users from making 10,000 requests per minute.

Traffic you can't control will eventually destroy what you've built.

8 min read

beginner

A way to control how many requests can happen in a given time window

You can limit by user, by IP, by API key, by endpoint, or any combination. Different limits for different tiers: free users get 10 requests/minute, paid users get 1,000, enterprise gets 10,000.

Rate limiting is the cheapest insurance policy in software. A few lines of code can save you from outages, runaway costs, and abusive users.

See rate limiting in action

This API allows 10 requests per 10 seconds. Try to exceed the limit and see what happens.

Requests Used0 / 10

Request Log

Make a request to see the log

What you just discovered:

Three common rate limiting strategies

Fixed Window

Simple counter that resets on the clock

Count requests per minute (or hour). At :00, the counter resets. Easy to implement, easy to understand. But users can burst 100 requests at :59 and 100 more at :00.

Pro: Simple to implement and reason about

Con: Allows bursts at window boundaries

Sliding Window

Smoother limits without hard resets

Instead of resetting at :00, the window slides. At 1:30, you're looking at requests from 0:30 to 1:30. No boundary bursts, but slightly more complex to track.

Pro: Prevents boundary burst exploits

Con: Requires more storage and computation

Token Bucket

Allow controlled bursts

You have a bucket that fills with tokens at a steady rate. Each request removes a token. Bucket can hold extra tokens for bursts. When empty, requests wait or fail.

Pro: Allows legitimate bursts while enforcing average rate

Con: More complex to configure correctly

"Stop that one user from bankrupting our AI budget"

Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed

Predictable Costs

Outcome

React Flow

Foundation

Quality & Reliability

Optimization

Outcome

Animated lines show direct connections · Hover for detailsTap for details · Click to learn more

Upstream (Requires)

Foundation layer - no upstream dependencies

Downstream (Enables)

Retry Strategies Circuit Breakers Cost Attribution

What breaks when rate limiting goes wrong

Don't rate limit only at the edge

You put rate limiting on your API gateway. Feels safe. Then an internal service calls another internal service in a loop. No gateway in between. Internal DDoS.

Instead: Rate limit at every layer that matters: gateway, service-to-service, and database queries.

Don't forget to return proper headers

You return 429 Too Many Requests but don't tell clients when to retry. They immediately retry. You're now rate limiting their retries. Infinite frustration loop.

Instead: Always include Retry-After header. Tell clients exactly when they can try again.

Don't set limits without monitoring

You set 100 requests/minute because it felt right. Six months later, legitimate usage patterns have changed. Good users are getting blocked. Bad users found workarounds.

Instead: Monitor your rate limit hits. Alert when good users hit limits. Adjust limits based on real usage.