为什么你的负载均衡器仍然将流量发送到已失效的后端
Why Your Load Balancer Still Sends Traffic to Dead Backends

原始链接: https://singh-sanjay.com/2026/01/12/health-checks-client-vs-server-side-lb.html

## 健康检查与负载均衡:总结 有效的健康检查对于具有弹性的系统至关重要,但其实现方式因负载均衡发生的位置而异——是在中心化处(服务器端),还是在每个客户端内(客户端)。 **服务器端负载均衡**(例如使用 HAProxy 或 AWS ALB)提供对服务健康状况的单一、一致的视图。负载均衡器会定期探测后端,需要时间来检测故障(在典型设置下可能长达 15 秒),但可确保所有客户端接收一致的路由。更改是即时的且中心化的。 **客户端负载均衡**(例如 gRPC 或 Ribbon)将路由智能分发到每个客户端。这允许通过*被动*健康检查(响应失败的请求)更快地检测故障,但会引入最终一致性——客户端可能对健康状况有不同的看法。*主动*检查(每个客户端的定期探测)会增加系统负载。 服务器端在简单性和易操作性方面表现出色,非常适合大多数场景。客户端端在规模上表现出色,可降低延迟并消除中心瓶颈,但需要更复杂的客户端逻辑。 许多系统同时使用两者:服务器端用于外部访问,客户端用于内部服务通信。理解这两种模式及其权衡对于诊断路由问题和构建真正具有弹性的应用程序至关重要。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 为什么你的负载均衡器仍然将流量发送到已失效的后端 (singh-sanjay.com) 4 分,由 singhsanjay12 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 singhsanjay12 2 小时前 [–] 我写这篇文章是因为我遇到过实例技术上“启动”但显然无法正确提供流量的情况。文章探讨了客户端和服务器端负载均衡在故障检测速度、一致性和操作复杂性方面的差异。我希望得到那些操作过服务网格、Envoy/HAProxy 设置或大型分布式集群的人的反馈——特别是关于边缘情况和扩展权衡的问题。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

A service reports healthy. The load balancer believes it. A request lands on it and times out. Another follows. Then ten more. By the time the system reacts, hundreds of requests have drained into a broken instance while users stared at a spinner.

Health checking sounds simple: ask if something is alive, stop sending traffic if it isn’t. In practice, the mechanism behind that check, and who performs it, determines how fast your system detects failure, how accurately it responds, and how much of that complexity leaks into your application code.

The answer is fundamentally different depending on where load balancing lives: in a central proxy, or in the client itself.

Two Models for Distributing Traffic

Before getting into health checks, it helps to be precise about what each model looks like.

Server-Side Load Balancing

A dedicated proxy sits between clients and the backend fleet. Clients know one address: the load balancer. The load balancer knows the backend pool and decides where each request goes.

The load balancer is the single point of intelligence. It tracks backend health, maintains connection pools, and routes traffic. Clients are completely unaware of the backend topology; they see one stable address regardless of how many instances are behind it, or how many fail.

HAProxy, NGINX, AWS ALB, and most hardware appliances follow this model.

Client-Side Load Balancing

The routing intelligence moves into the client. Each client holds a local view of the available backend instances, typically populated from a service registry, and makes its own routing decision on every request.

There is no proxy in the request path. A service registry keeps the authoritative list of instances. Clients subscribe to updates and maintain their own routing table. gRPC’s built-in load balancing, Netflix Ribbon, and LinkedIn’s D2 all work this way. The registry often exposes instance addresses through DNS — which introduces its own propagation delays and failure modes, covered in It’s Always DNS.

Health Checking: Who Asks, and How

The two models produce fundamentally different answers to the same question: is this instance healthy?

Health Checking in Server-Side Load Balancing

The load balancer owns health checking entirely. It runs periodic probes against each backend, typically a TCP connect, an HTTP request to a /health endpoint, or a custom command, on a fixed schedule.

A typical configuration might look like:

  • Interval: probe every 5 seconds
  • Timeout: wait up to 2 seconds for a response
  • Rise threshold: 2 consecutive successes to mark healthy
  • Fall threshold: 3 consecutive failures to mark unhealthy

These thresholds exist to avoid flapping: toggling an instance in and out of rotation on a single transient failure. The downside is latency. With a 5-second interval and a fall threshold of 3, a hard failure takes up to 15 seconds to detect. During that window, real traffic continues to hit the broken instance.

Once the load balancer marks an instance unhealthy, it removes it from the rotation immediately. No client needs to be updated; the change is in one place, takes effect instantly, and is consistent for all callers.

Health Checking in Client-Side Load Balancing

With no central proxy, health checking is distributed. Each client must independently determine which instances in its local list are safe to use. There are two approaches, and most production systems use both.

Active health checks: the client (or a sidecar process) periodically probes each known instance, just like a server-side load balancer would. The difference is that every client runs its own probe loop. With 500 clients each checking 20 instances every 5 seconds, that is 2,000 probe requests per second hitting your fleet, just for health signals, before any real traffic.

Each client forms its own independent view. Two clients probing the same instance at different moments can reach different conclusions, especially during the brief window when an instance is degrading. The fleet’s health state is eventually consistent rather than authoritative.

Passive health checks (also called outlier detection or failure tracking) take a different approach: instead of probing, the client watches the outcomes of real requests. A connection refused, a timeout, a stream of 500s. These are signals that something is wrong with that instance. The client marks it unhealthy locally and stops routing to it for a backoff period.

Passive checking has a meaningful advantage: failure detection is immediate. The first failed request triggers the response; there is no polling interval to wait through. The cost is that at least one real request must fail before the client reacts. In high-throughput systems this is usually acceptable; in low-traffic or bursty scenarios it can mean more user-visible errors.

What Each Model Gets Right

Server-side load balancing gives you a single, consistent view of fleet health. Every client gets the same routing decisions without knowing anything about the backend topology. This is operationally simple: health check configuration lives in one place, changes take effect instantly across all callers, and the backend is completely decoupled from the routing logic. At modest scale, a few dozen services and hundreds of clients, this is almost always the right default.

Client-side load balancing trades that simplicity for scale. When you have thousands of services talking to each other at high call rates, a central proxy becomes a bottleneck and a single point of failure. Removing it from the request path reduces latency and eliminates a class of infrastructure failure. Passive health checking gives clients sub-request-latency failure detection that a polling-based central proxy simply cannot match.

The cost is real: distributed health state is harder to reason about. Two clients can disagree on whether an instance is healthy. Debugging a routing anomaly requires looking at state spread across hundreds of processes rather than one. And the health check logic itself (thresholds, backoff, jitter) needs to live in every client library, tested and maintained across every language your organization uses.

Choosing Between Them

There is no universal answer. The right model depends on your fleet size, call rates, operational maturity, and how much complexity you can manage in client libraries.

Server-side load balancing is simpler to operate and reason about. For most teams and most services, it is the right starting point.

Client-side load balancing pays off when scale makes a central proxy genuinely painful: when the proxy itself becomes a bottleneck, when you need sub-millisecond failure detection, or when the overhead of a proxy hop is measurable and matters.

Many large systems end up using both: server-side load balancing at the ingress layer where clients are external and uncontrollable, and client-side load balancing for internal service-to-service calls where the client library can be standardized. The health checking story in each layer is different, the failure modes are different, and understanding both is what lets you reason clearly about where traffic actually goes when things go wrong.

联系我们 contact @ memedata.com