Back to blog
Architecture
System Design
Backend

System Design: Scaling to 10k Requests per Second

2 min read

Handling high traffic requires a shift from "making it work" to "making it scale". When an application hits 10,000 requests per second (RPS), the bottlenecks in standard web architectures become painfully visible.

Here are the key principles I follow when designing systems for this scale.

1. Decentralize the Database

The database is almost always the first bottleneck. Relational databases like PostgreSQL scale vertically very well, but eventually, you hit hardware limits.

To scale DB operations:

  • Read Replicas: Route read-only queries to replica databases. This handles read-heavy workloads (which most web apps are).
  • Sharding/Partitioning: Divide a large database into smaller, faster, more easily managed parts.

Example of configuring a connection pool in Node.js to use replicas:

const { Pool } = require('pg')

// Primary for Writes
const primaryPool = new Pool({ connectionString: process.env.DATABASE_URL })

// Replica for Reads
const replicaPool = new Pool({ connectionString: process.env.DATABASE_REPLICA_URL })

async function getUser(id) {
  // Use replica for reads
  return replicaPool.query('SELECT * FROM users WHERE id = $1', [id])
}

2. Aggressive Caching Strategy

The fastest query is the one you never make to the database.

I typically implement caching at multiple layers:

  1. CDN Level: Cache static assets and public API responses (Vercel Edge Network, Cloudflare).
  2. Application Level (Memory): Small, highly requested immutable data mapped in variables.
  3. Distributed Cache (Redis): Session data, DB query results, and rate-limiting counters.

3. Asynchronous Processing

Not everything needs to happen during the HTTP request cycle. If a user uploads an image, they don't need to wait for it to be processed and compressed.

Put the task in a queue (RabbitMQ, SQS) and process it in the background using worker nodes.

Key takeaway: Understand the latency numbers. Reading from memory is nanoseconds; a network round-trip is milliseconds. Design your critical paths to rely on the fastest data sources possible.