The truth is that cluster size matters far less than the rate of operations on any single resource Kind—especially creates and updates. Operations on different Kinds are isolated: each runs in its own goroutine protected by its own mutex. You can even shard across multiple etcd clusters by resource kind, so cross-kind modifications scale relatively independently.
The biggest source of writes is ususally Lease updates that keep Nodes alive. That makes cluster size fundamentally constrained by how quickly the system can process those updates.
A standard etcd setup on modern hardware sustains roughly 50,000 modifications per second. With careful sharding (separate etcd clusters for Nodes, Leases, and Pods), you could likely support around 500,000 nodes with standard etcd.
Replacing etcd with a more scalable backend shifts the bottleneck to the kube-apiserver’s watch cache. Each resource Kind today is guarded by a single RWMutex over a B-tree. Replacing that with a hash map can likely support ~100,000 events/second, enough to support 1 million nodes on current hardware. To go beyond that, increase the Lease interval (e.g., >10s) to reduce modification rate.
At scale, the biggest aggregate limiter is Go’s garbage collector. The kube-apiserver creates and discards vast numbers of small objects when parsing and decoding resources, and this churn drives GC pressure. Adding more kube-apiserver replicas doesn’t help, since all of them are subscribed to the same event streams.