DNS 是为人设计的，而非为 IT 基础设施。

DNS 是为人设计的，而非为 IT 基础设施。
DNS Is for People – Not for IT Infrastructure

原始链接: https://louwrentius.com/dns-is-for-people-not-for-it-infrastructure.html

尽管域名系统（DNS）对于面向公众的服务至关重要，但本文质疑了其对内部IT基础设施的必要性。作者认为，由于DNS往往是关键依赖项，其故障可能导致不成比例的重大宕机，例如臭名昭著的Meta/Facebook事件。除了可靠性问题，文章还指出了在机器对机器通信中使用DNS的几个弊端： * **复杂性：** DNS引入了不必要的开销和配置障碍，例如管理生存时间（TTL）缓存以及潜在的DNSSEC实施负担。 * **安全风险：** DNS通常未加密，容易受到欺骗攻击。此外，它还带来了显著的出口数据泄露风险，因为攻击者可以通过DNS查询绕过网络过滤器来泄露敏感数据。作者提出了一种替代方案：取消内部基础设施的DNS，转而直接在配置文件中注入IP地址，或通过`/etc/hosts`管理主机名。通过减少活动部件的数量，工程师可以构建更稳健、可预测且安全的系统。归根结底，虽然DNS是一个有用的工具，但团队应权衡其带来的益处与在内部架构中引入的额外风险和复杂性。

这篇 Hacker News 讨论批评了一篇主张废除 DNS、改而在配置文件中使用硬编码 IP 地址的文章。评论者们压倒性地认为该提议不切实际且在技术上有缺陷。主要的反驳观点包括： * **运维僵化**：与更新单一 DNS 记录的简便性相比，依赖硬编码 IP 会使服务器迁移或 IP 变更等基础设施维护工作变得极其困难。 * **维护成本**：批评者认为，用手动配置文件取代 DNS（一种集中化、标准化的系统）引入了不必要的复杂性，增加了配置偏差的可能性，并创造了一个“高维护成本”的环境。 * **可扩展性顾虑**：将 DNS 替换为 `/etc/hosts` 之类的本地解决方案被形容为“疯狂”，因为它本质上迫使每台机器都充当其自己的定制 DNS 服务器，从而失去了集中管理的优势。总体而言，社区认为该文章的前提是基础设施管理的一种倒退，并倾向于 DNS 的可靠性和灵活性，而非硬编码地址的脆弱性。

原文

The Domain Name System exists because it's difficult for people to remember IP addresses (185.15.59.224) and much easier to remember domain names (wikipedia.org).

Regarding internet-accessible services, it makes sense to publish websites, API endpoints or similar services using DNS, as people have to interfact with them. The added benefit of a domain name is that the associated IP address can change without the client being affected.

This article isn't against DNS for public services, but it questions if we should use DNS for internal IT infrastructure (independent of cloud vs. onprem)

It's always DNS

Although DNS can be a very beneficial service, it can also become a liability. If you want a reliable system, you want as little components as possible. Every additional component adds a potential risk of failure. In addition, more components may create unforeseen behaviour and interactions that can cause outages (circular dependancies, and so on). If you can avoid adding components, you'll have a better chance of building a reliable system.

Within the IT operations space, DNS has made a bit of a name for itself. Many may remember this little haiku.

It’s not DNS
There’s no way it’s DNS
It was DNS

(source)

There are multiple(1) high-profile(2) incidents where DNS was involved. In these linked cases, the root-cause of the incident isn't the DNS system itself. Yet, because the root-cause affects the DNS service - which is in the critical path for virtually all services - the incident has such a huge impact.

The Facebook / Meta outage was so significant because it locked people out of buildings (physical access) due to 'circular' dependancies on DNS being available. Again, it can be said that the circular dependancy is the root-cause, but the blast radius of DNS is in many cases so enormous that it may be difficult to have a clear end-to-end picture of potential risk.

The case against DNS for internal IT infrastructure

From the perspective of IT operations, DNS has a drawback: DNS clients cache DNS records based on TTL. Different DNS client implementations can behave differently, but even if you have a fairly homogenous environment, the only way to assure clients (in this case other servers) use the updated IP address, is to control them and force a DNS refresh.

That got me thinking, why would we use DNS for infrastructure services? It isn't necessary for machine-to-machine communication. Instead of configuring domain names that may not resolve, we can just directly inject the appropriate IP address(ess) into configuration files. It's easy to configure systems with tools like Ansible or pyinfra at scale.

The counter argument could be that DevOPS / platform engineers are also humans, and it's much easier to spot misconfigurations or to troubleshoot if domain names are configured Instead of IP addresses.

Fortunately, we still have /etc/hosts, which we can easily provision. Still no DNS service required! This way, we can configure domain names and pretend to use DNS. I also suspect that DNS queries against /etc/hosts are quite responsive.

DNS as generic security risk

As of today, most network traffic is encrypted by default, or tunneled through an encrypted channel. DNS is - by default - the exception. Regarding internal IT infrastructure (cloud or 'onprem'), the network may be considered as a secure environment. An attack on the DNS service, spoofing packets, and so on, can be very disruptive though. Setting up DNSSEC may alleviate this problem, but that also introduces another administrative burden with it's own risk of misconfiguration. It's yet another layer of complexity. And we assume that internal infrastructure supports DNSSEC.

DNS as an Egress Exfiltration risk

Because egress filtering (filtering of outbound connections) can be cumbersome, it's often omitted, because the systems involved are 'trusted'. This is unfortunate as this makes life easier for an attacker. Any kind of resource required for an attack can be acquired on the vulnerable system with a simple outbound query towards the internet. Proper egress filtering of network traffic can be the difference between a succesfull and unsuccessful hacking attempt.

A lack of egress filtering also makes it much easier for an attacker to exfiltrate data. And the thing is: any IP protocol can be used to exfiltrate data, including DNS^.

This is how: the attacker gets a domain runs their internet-accessible authoritative nameserver for this domain. Now the attacker can make DNS requests to said domain like sensitivedata.evil.domain from the hacked system and you can extract all the data from the rogue DNS server logs^.

Although a hacked server may not be able to directly interact with the attacker-controlled DNS server, by issuing DNS requests for the attacker-controlled domain, these requests will pass the local forwarding DNS server and be forwarded towards the attacker-controlled authoritative DNS server. See also tools like dnscat2 or iodine

Due to this risk, there is a case to be made, to - at least - not allow systems to query public DNS records. As servers may need to interfact with services on the internet (update servers, APIs, and so on), such access can be facilitated by a proxy server using allow-listed domains.

Evaluation and closing words

In the end, everything is a tradeoff, where people must balance benefits and drawbacks against the context of their infrastructure, their particular risk appetite and even organisational structure and culture.

That said, I think it's reasonable to explore if DNS can be avoided altogether within the IT infrastructure to increase reliability and robustness.

Feel free to share your thoughts and feelings about this if you feel so inclined.