程序员对 TCP 的误解

程序员对 TCP 的误解
Falsehoods programmers believe about TCP

原帖讨论了 TCP（传输控制协议）及其可靠性问题。作者解释了他们使用“NetworkManager”（用于管理网络连接的实用程序）和“wpa\_supplicant”（用于配置 WiFi 连接的工具）的个人经验。由于信号质量差和不稳定的无线环境，他们遇到了频繁的断开连接，导致应用程序行为不稳定。针对这个问题，作者反驳了人们对 TCP 的普遍看法，并强调了与其性能相关的各种谎言。例如，TCP 可能被认为是可靠的，但作者指出，这并不一定意味着所有传输的数据都会到达目的地，也不一定意味着发送方和接收方始终会就正确发送和接收的字节达成一致。此外，通过更高级别的应用程序协议创建类似于 TCP 提供的保证并不简单，因为解决复杂的同步问题需要两个以上的节点（例如 Paxos 或 Raft 算法）。此外，网络可能并不总是按照标准协议运行，因此在设计和实现系统时考虑潜在的非标准行为非常重要。最后，这篇文章涉及拥塞控制等主题，指出如果没有正确处理网络内的拥塞，增加活动 TCP 连接的数量可能不会提高速度。对话还提到了网络阻止互联网控制消息协议 (ICMP) 数据包或丢弃无法识别的流量等特性。总之，TCP 有其局限性，不应被认为是完美无缺的。网络管理员和开发人员在使用 TCP 和设计网络应用程序时应考虑网络条件、不一致和潜在的异常情况。

以下是对原文的100字总结： * 许多网络并不完善并且不遵循标准协议，因此应用工程师不必担心它们。 * 如果传输控制协议 (TCP) 在特定路径上不可靠，除了在应用程序中添加临时实现之外，几乎无能为力。 * 文中做出的一些假设是显而易见的或不必要的，例如，声称每个名字都不能用 Unicode 拼写是荒谬的。 * 提供可靠指导的套接字编程资源，包括 Beej 的网络编程指南和 Mad Wizard 的 WinSock 教程。 * 阻塞与非阻塞套接字决定程序如何构建其网络，阻塞需要线程或进程，而非阻塞需要轮询和事件循环。 * 当前的网络方法涉及异步等待函数 (await) 和处理非阻塞套接字的事件循环，将数据传递给事件处理程序。 * 碎片数据包可能不包含 TCP 标头，导致尝试使用 Wireshark 过滤数据包时出现问题。 * 当将基于数据报的协议转换为面向流的协议时，需要区分各个数据报，从而产生创造性的解决方案，例如添加内容长度标头。 * 某些网络可能会将多个请求视为单个流，当区分请求变得困难时会导致混乱。 * 几个巧妙的 eBPF 示例演示了复杂网络问题的创造性解决方案。为了澄清这一点，原文讨论了使用网络时面临的常见问题，特别是关于 TCP 可靠性以及如何处理碎片数据包和处理流中的多个请求。它提到了一些用于学习网络的宝贵资源，并提供了有关应用程序如何应对网络挑战的见解。此外，它还涉及将基于数据报的协议转换为面向流的媒体时遇到的困难，并强调了准确识别流中各个数据报的重要性。最后，它展示了创造性解决方案的力量，例如利用内容长度标头来克服障碍。

Posted Sep 13, 2024 22:42 UTC (Fri) by NYKevin (subscriber, #129325)
In reply to: NetworkManager or networkd by mathstuf
Parent article: Debating ifupdown replacements for Debian trixie

> FWIW, I dropped NetworkManager years ago for `wpa_supplicant`-based management because I had flaky wireless situations (thick concrete walls in the dorms, roaming across campus, etc.) and any whiff of packet loss would announce to the whole machine "no network" and apps would start to freak out and react. However, it was likely to be back Real Soon™ and normal TCP recovery would make it "transparent" (if with a spike in latency).

Somebody ought to write one of those "falsehoods programmers believe" articles for TCP, because this is just reflective of a broader trend of software that thinks it knows better than TCP, and usually does not. Here, I'll even get the ball rolling (remember, all of the following statements are *false* at least some of the time, but for some of these, perhaps not very often):

1. TCP is reliable, so everything I send will be received by the other end.
2. OK, mostly reliable.
3. OK, fine, it's not reliable (in the above sense of the word), but the sender and recipient will always eventually agree on exactly which bytes made it over the transport.
4. It is possible to create a guarantee analogous to (3) by building some message-oriented application-level protocol on top of TCP, such as HTTP or SMTP.
5. There is a such thing as a TCP packet.
6. There is no such thing as a TCP packet.
7. If we fail to connect to a well-known remote host, then we must be offline.
8. Nagle's algorithm is good.
9. Nagle's algorithm is bad.
10. I don't have to care about Nagle's algorithm.
11. This is all low-level pedantry. I can think of TCP like a two-way Unix pipe that goes over the network, and completely ignore how it is implemented.
12. If the network is transparent to TCP, then it must be transparent to IP.
13. If the network is transparent to HTTP/1.1, then it must be transparent to TCP.
14. Weird networks that are not transparent to standard protocols are an aberration. I can safely ignore them.
15. TCP is implemented in terms of IP.

Explainer for 1-4: https://en.wikipedia.org/wiki/Two_Generals%27_Problem. TL;DR: If the connection breaks while an ACK is outstanding, the sender will have no way of knowing whether the segment was received, and this turns out to be an insoluble problem no matter how much complexity you pile on top of it. You need something resembling Paxos or Raft to get a guarantee like that, and that always requires a minimum of three nodes, so it can't be built on top of a single two-party TCP stream. See RFC 1047 for an SMTP-specific discussion of this problem (which still applies to modern SMTP, since RFC 2821 says that implementations MUST follow 1047's core advice), but note that some variation of this problem applies to literally every two-party TCP service (and for that matter, every UDP or IP service as well), regardless of how it works or what abstractions it introduces. SMTP is only special in that both sides are explicitly required to care about whether the message was received or not, which is marginally unusual for TCP services (compare and contrast: FTP file uploads, HTTP POST and PUT, etc., most of which omit significant discussion of client retry logic in favor of leaving it up to the application or end user).

15 is left as an exercise for the reader (hint: it is primarily of historical interest, but I'm not sure it's possible to entirely rule out modern counterexamples, since we don't know what weird stuff is going on in [any large organization]'s private network).

NetworkManager or networkd

Posted Sep 14, 2024 12:24 UTC (Sat) by paulj (subscriber, #341) [Link] (3 responses)

NetworkManager or networkd

Posted Sep 15, 2024 18:37 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

My point is not that there is no set of bytes the parties agree on. My point is that it is not possible for either party to know exactly which bytes are in the consensus set.

NetworkManager or networkd

Posted Sep 15, 2024 18:37 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

(To clarify: It is possible for a party to know the consensus set contains *at least* the first N bytes. It is not possible for either party to know that the consensus set contains *exactly* the first N bytes.)

Or networks that block ICMP, or networks that drop anything they don't understand...

NetworkManager or networkd

Posted Sep 14, 2024 21:11 UTC (Sat) by Sesse (subscriber, #53779) [Link]

16. I don't need to know anything about congestion control (a sub-category of this one is “If I don't get the speed I want, I should open multiple TCP connections”)