智能制造管理接口

智能制造管理接口
IPMI

原始链接: https://computer.rip/2024-08-31-ipmi.html

作者讨论了服务器的概念，重点讨论了它们的历史意义和现代用法。他指出，最初，服务器在架构和操作系统方面与客户端不同。然而，随着技术的发展，这种区别开始变得模糊，导致许多用户将服务器简单地称为大型、功能强大的计算机。他提到了客户端-服务器计算，并将其解释为分时计算的扩展，其中多个终端连接到中央计算机。与服务器相比，客户端的结构简单，这导致尽管该领域日益融合，但两者之间的区别仍然存在。他进一步解释了服务器与企业相关的功能，并指出它们在企业环境中至关重要。服务器为企业高效运营所需的关键数据和应用程序提供稳定的平台。服务器的一项显着功能是带外管理，它提供了额外的功能，例如即使在关闭时也能够远程管理服务器。此功能使管理员能够有效地保持对其基础设施的控制，确保最佳的正常运行时间和最小的中断。近年来，英特尔架构在服务器领域的主导地位巩固了市场，导致竞争品牌之间的差异缩小。作者反思了这一趋势，质疑服务器是否只是一台大型计算机，还是拥有值得归类为专用设备的独特品质。最后，他谈到了英特尔的管理引擎 (ME)，提到了它在许多现代 CPU 上提供带外管理方面的作用。尽管过去存在安全问题，但英特尔 ME 仍然是企业环境中的一个突出功能，因为它在管理安全计算组件（如安全启动和数字版权管理 (DRM)）方面具有实用性。

本文讨论了有关 IT 基础设施的以下主题： 1. 除特定低功耗应用外，英特尔在 CPU 和 GPU 市场上对 AMD 失去了优势。 * 由于 vSphere 的 EVC 等技术，英特尔 CPU 主要用于通过类似替代品更新现有环境的组织，可实现 CPU 架构之间的平滑过渡。 2. 可靠的英特尔网络接口卡 (NIC)，特别适用于消费类设备和企业服务器。 3. 尽管与 Dell/HPE 相比，Supermicro 服务器的支持较少，但由于其成本较低、灵活性和可靠性较低，成为 IT 专业人员的首选。 4. 将 IPMI（智能平台管理接口）规范替换为 Redfish（用于智能管理的冗余嵌入式系统 Web 服务），以提高安全性、完整性和易用性。 5. 尽管超微的商业行为据称存在问题，但它仍然是主要云提供商的优质供应商。 6. ACPI（高级配置和电源接口）睡眠状态虽然对数据中心很有价值，但对于非数据中心应用程序来说是不必要的。 7. 旧版本的 Supermicro 服务器缺乏对现代风扇类型的支持，需要在服务器群中保持恒定的最大速度运行。 8. 与 Redfish 承诺的功能相比，IPMI 过时的 Web 界面。 9. IPMI的高级功能有限，使其依赖于专有解决方案，而Redfish的目标是跨各种系统的全面支持。 10. 过去提到过英特尔过时的紧急管理卡 (EMC)，该卡预装在基于英特尔的服务器中，具有默认密码“Calvin”（类似于 iDRAC 默认值），并具有基本的嵌入式 DOS 机器特征。这些 EMC 卡可以在 eBay 上找到，适合寻求复古 IT 体验的爱好者。

原文

I am making steady progress towards moving the Computers Are Bad enterprise cloud to its new home, here in New Mexico. One of the steps in this process is, of course, purchasing a new server... the current Big Iron is getting rather old (probably about a decade!) and here in town I'll have the rack space for more machines anyway.

In our modern, cloud-centric industry, it is rare that I find myself comparing the specifications of a Dell PowerEdge against an HP ProLiant. Because the non-hyperscale server market has increasingly consolidated around Intel specifications and reference designs, it is even rarer that there is much of a difference between the major options.

This brings back to mind one of those ancient questions that comes up among computer novices and becomes a writing prompt for technology bloggers. What is a server? Is it just, like, a big computer? Or is it actually special?

There's a lot of industrial history wrapped up in that question, and the answer is often very context-specific. But there are some generalizations we can make about the history of the server: client-server computing originated mostly as an evolution of time-sharing computing using multiple terminals connected to a single computer. There was no expectation that terminals had a similar architecture to computers (and indeed they were usually vastly simpler machines), and that attitude carried over to client-server systems. The PC revolution instilled a WinTel monoculture in much of client-side computing by the mid-'90s, but it remained common into the '00s for servers to run entirely different operating systems and architectures.

The SPARC and Solaris combination was very common for servers, as were IBM's minicomputer architectures and their numerous operating systems. Indeed, one of the key commercial contributions of Java was the way it allowed enterprise applications to be written for a Solaris/SPARC backend while enabling code reuse for clients that ran on either stalwarts like Unix/RISC or "modern" business computing environments like Windows/x86. This model was sometimes referred to as client-server computing with "thick clients." It preserved the differentiation between "server" and "client" as classes of machines, and the universal adherance of serious business software to this model lead to an association between server platforms and "enterprise computing."

Over time, things have changed, as they always do. Architectures that had been relegated to servers became increasingly niche and struggled to compete with the PC architecture on cost and performance. The general architecture of server software shifted away from vertical scaling and high-uptime systems to horizontal scaling with relaxed reliability requirements, taking away much of the advantage of enterprise-class computers. For the most part, today, a server is just a big computer. There are some distinguishing features: servers are far more likely to be SMP or NUMA, with multiple processor sockets. While the days of SAS and hardware RAID are increasingly behind us, servers continue to have more complex storage controllers and topologies than clients. And servers, almost by definition, offer some sort of out of band management.

Out-of-band management, sometimes also called lights-out management, identifies a capability that is almost unheard of in clients. A separate, smaller management computer allows for remote access to a server even when it is, say, powered off. The terms out-of-band and in-band in this context emerge from their customery uses in networking and telecom, meaning that out of band management is performed without the use of the standard (we might say "data plane") network connection to a machine. But in practice they have drifted in meaning, and it is probably better to think of out-of-band management as meaning that the operating system and general-purpose components are not required. This might be made clearer by comparison: a very standard example of in-band management would be SSH, a service provided by the software on a computer that allows you to interact with it. Out-of-band management, by contrast, is provided by a dedicated hardware and software stack and does not require the operating system or, traditionally, even the CPU to cooperate.

You can imagine that this is a useful capability. Today, out-of-band management is probably best exemplified by the remote console that most servers offer. It's basically an embedded IP KVM, allowing you to interact with the machine as if you were at a locally connected monitor and keyboard. A lot of OOB management products also offer "virtual media," where you can upload an ISO file to the management interface and then have it appear to the computer proper as if it were a physical device. This is extremely useful for installing operating systems.

OOB management is an interesting little corner of computer history. It's not a new idea at all; in fact, similar capabilities can be found through pretty much the entire history of business computing. If anything, it's gotten simpler and more boring over time. A few evenings ago I was watching a clabretro video about an IBM p5 he's gotten working. As is the case in most of his videos about servers, he has to give a brief explanation of the multiple layers of lower-level management systems present in the p5 and their various textmode and web interfaces.

If we constrain our discussion of "servers" to relatively modern machines, starting say in the late '80s or early '90s, there are some common features:

Some sort of local operator interface (this term itself being a very old one), like an LCD matrix display or grid of LED indicators, providing low-level information on hardware health.
A serial console with access to the early bootloader and a persistent low-level management system.
A higher-level management system, with a variable position in the stack depending on architecture, for remote management of the machine workload.

A lot of this stuff still hangs around today. Most servers can tell you on the front panel if a redundant component like a fan or power supply has failed, although the number of components that are redundant and can be replaced online has dwindled with time from "everything up to and including CPUs" on '90s prestige architectures to sometimes little more than fans. Serial management is still pretty common, mostly as a holdover of being a popular way to do OS installation and maintenance on headless machines [1].

But for the most part, OOB management has consolidated in the exact same way as processor architecture: onto Intel IPMI.

IPMI is confusing to some people for a couple of reasons. First, IPMI is a specification, not an implementation. Most major vendors have their own implementation of IPMI, often with features above and beyond the core IPMI spec, and they call them weird acronyms like HP iLO and Dell DRAC. These vendor-specific implementations often predate IPMI, too, so it's never quite right to say they are "just IPMI." They're independent systems with IPMI characteristics. On the other hand, more upstart manufacturers are more likely to just call it IPMI, in which case it may just be the standard offering from their firmware vendor.

Further confusing matters is a fair amount of terminological overlap. The IPMI software runs on a processor conventionally called the baseboard management controller or BMC, and the terms IPMI and BMC are sometimes used interchangeably. Lights-out management or LOM is mostly an obsolete term but sticks around because HP(E) is a fan of it and continues to call their IPMI implementation Integrated Lights-Out. The BMC should not be confused with the System Management Controller or SMC, which is one of a few terms used for a component present in client computers to handle tasks like fan speed control. These have an interrelated history and, indeed, the BMC handles those functions in most servers.

IPMI also specifies two interfaces: an out-of-band interface available over the network or a serial connection, and an in-band interface available to the operating system via a driver (and, in practice, I believe communication between the CPU and the baseboard management controller via the low-pin-count or LPC bus, which is a weird little holdover of ISA present in most modern computers). The result is that you can interact with the IPMI from a tool running in the operating system, like ipmitool on Linux. That makes it a little confusing what exactly is going on, if you don't understand that the IPMI is a completely independent system that has a local interface to the running operating system for convenience.

What does the IPMI actually do? Well, like most things, it's mostly become a webapp. Web interfaces are just too convenient to turn down, so while a lot of IPMI products do have dedicated client software, they're porting all the features into an embedded web application. The quality of these web interfaces varies widely but is mostly not very good. That raises a question, of course, of how you get to the IPMI web interface.

Most servers on the market have a dedicated ethernet interface for the IPMI, often labelled "IPMI" or "management" or something like that. Most people would agree that the best way to use IPMI is to put the management network interface onto a dedicated physical network, for reasons of both security and reliability (IPMI should remain accessible even in case of performance or reliability problems with your main network). A dedicated physical network costs time, space, and money, though, so there are compromises. For one, your "management network" is very likely to be a VLAN on your normal network equipment. That's sort of like what AT&T calls a common-carrier switching arrangement, meaning that it behaves like an independent, private network but shares all of the actual equipment with everything else, the isolation being implemented in software. That was a weird comparison to make and I probably just need to write a whole article on CCSAs like I've been meaning to.

Even that approach requires extra cabling, though, so IPMI offers "sideband" networking. With sideband management, the BMC communicates directly with the same NIC that the operating system uses. The implementation is a little bit weird: the NIC will pretend to be two different interfaces, mixing IPMI traffic into the same packet stream as host traffic but using a different MAC address. This way, it appears to other network equipment as if there are two different network interfaces in use, as usual. I will leave judgment as to how good of an idea this is to you, but there are obvious security considerations around reducing the segregation between IPMI and application traffic.

And yes, it should be said, a lot of IPMI implementations have proven to be security nightmares. They should never be accessible to any untrusted person.

Details of network features vary between IPMI implementations, but there is a standard interface on UDP 623 that can be used for discovery and basic commands. There's often SSH and a web interface, and VNC is pretty common for remote console.

There are some neat basic functions you can perform with the IPMI, either over the network or locally using an in-band IPMI client. A useful one, if you are forgetful and keep poor records like I do, is listing the hardware modules making up the machine at an FRU or vendor part number level. You can also interact with basic hardware functions like sensors, power state, fans, etc. IPMI offers a standard watchdog timer, which can be combined with software running on the operating system to ensure that the server will be reset if the application gets into an unhealthy state. You should set a long enough timeout to allow the system to boot and for you to connect and disable the watchdog timer, ask me how I know.

One of the reasons I thought to write about IPMI is its strange relationship to the world of everyday client computers. IPMI is very common in enterprise servers but very rare elsewhere, much to the consternation of people like me that don't have the space or noise tolerance for a 1U pizzabox in their homes. If you are trying to stick to compact or low-power computers, you'll pretty much have to go without.

But then, there's kind of a weird exception. What about Intel ME and AMD ST? These are essentially OOB management controllers that are present in virtually all Intel and AMD processors. This is kind of an odd story. Intel ME, the Management Engine, is an enabling component of Intel Active Management Technology (Intel AMT). AMT was pretty much an attempt at popularizing OOB management for client machines, and offers most of the same capabilities as IPMI. It has been considerably less successful. Most of that is probably due to pricing, Intel has limited almost all AMT features to use with their very costly enterprise management platforms. Perhaps there is some industry in which these sell well, but I am apparently not in it. There are open-source AMT clients, but the next problem you will run into is finding a machine where AMT is actually usable.

The fact that Intel AMT has sideband management capability, and that therefore the Intel ME component on which AMT runs has sideband management capability, was the topic of quite some consternation in the security community. Here is a mitigating factor: sideband management is only possible if the processor, motherboard chipset, and NIC are all AMT-capable. Options for all three devices are limited to Intel products with the vPro badge. The unpopularity of Intel NICs in consumer devices alone means that sideband access is rarely possible. vPro is also limited to relatively high-end processors and chipsets. The bad news is that you will have a hard time using AMT in your homelab, although some people certainly do. The upside is that the widely-reported "fact" that Intel ME is accessible via sideband networking on consumer devices is typically untrue, and for reasons beyond Intel software licensing.

That leaves an odd question around Intel ME itself, though, which is certainly OOB management-like but doesn't really have any OOB management features without AMT. So why do nearly all processors have it? Well, this is somewhat speculative, but the impression I get is that Intel ME exists mostly as a convenient way to host and manage trusted execution components that are used for things like Secure Boot and DRM. These features all run on the same processor as ME and share some common technology stack. The "management" portion of Intel ME is thus largely vestigial, and it's part of the secure computing infrastructure.

This is not to make excuses for Intel ME, which is entirely unauditable by third parties and has harbored significant security vulnerabilities in the past. But, remember, we all use one processor architecture from one of two vendors, so Intel doesn't have a whole lot of motivation to do better. Lest you respond that ARM is the way, remember that modern ARM SOCs used in consumer devices have pretty much identical capabilities.

It is what it is.

[1] The definition of "headless" is sticky and we have to not get stuck on it too much. People tend to say "headless" to mean no monitor and keyboard attached, but keep in mind that slide-out rack consoles and IP KVMs have been common for a long time and so in non-hyperscale environments truly headless machines are rarer than you would think. Part of this is because using a serial console is a monumental pain in the ass, so your typical computer operator will do a lot to avoid dealing with it. Before LCD displays, this meant a CRT and keyboard on an Anthro cart with wheels, but now that we are an enlightened society, you can cram a whole monitor and keyboard into 1U and get a KVM switching fabric that can cover the whole rack. Or swap cables. Mostly swap cables.

智能制造管理接口 IPMI

智能制造管理接口
IPMI