I am making steady progress towards moving the Computers Are Bad enterprise cloud to its new home, here in New Mexico. One of the steps in this process is, of course, purchasing a new server... the current Big Iron is getting rather old (probably about a decade!) and here in town I'll have the rack space for more machines anyway.
In our modern, cloud-centric industry, it is rare that I find myself comparing the specifications of a Dell PowerEdge against an HP ProLiant. Because the non-hyperscale server market has increasingly consolidated around Intel specifications and reference designs, it is even rarer that there is much of a difference between the major options.
This brings back to mind one of those ancient questions that comes up among computer novices and becomes a writing prompt for technology bloggers. What is a server? Is it just, like, a big computer? Or is it actually special?
There's a lot of industrial history wrapped up in that question, and the answer is often very context-specific. But there are some generalizations we can make about the history of the server: client-server computing originated mostly as an evolution of time-sharing computing using multiple terminals connected to a single computer. There was no expectation that terminals had a similar architecture to computers (and indeed they were usually vastly simpler machines), and that attitude carried over to client-server systems. The PC revolution instilled a WinTel monoculture in much of client-side computing by the mid-'90s, but it remained common into the '00s for servers to run entirely different operating systems and architectures.
The SPARC and Solaris combination was very common for servers, as were IBM's minicomputer architectures and their numerous operating systems. Indeed, one of the key commercial contributions of Java was the way it allowed enterprise applications to be written for a Solaris/SPARC backend while enabling code reuse for clients that ran on either stalwarts like Unix/RISC or "modern" business computing environments like Windows/x86. This model was sometimes referred to as client-server computing with "thick clients." It preserved the differentiation between "server" and "client" as classes of machines, and the universal adherance of serious business software to this model lead to an association between server platforms and "enterprise computing."
Over time, things have changed, as they always do. Architectures that had been relegated to servers became increasingly niche and struggled to compete with the PC architecture on cost and performance. The general architecture of server software shifted away from vertical scaling and high-uptime systems to horizontal scaling with relaxed reliability requirements, taking away much of the advantage of enterprise-class computers. For the most part, today, a server is just a big computer. There are some distinguishing features: servers are far more likely to be SMP or NUMA, with multiple processor sockets. While the days of SAS and hardware RAID are increasingly behind us, servers continue to have more complex storage controllers and topologies than clients. And servers, almost by definition, offer some sort of out of band management.
Out-of-band management, sometimes also called lights-out management, identifies a capability that is almost unheard of in clients. A separate, smaller management computer allows for remote access to a server even when it is, say, powered off. The terms out-of-band and in-band in this context emerge from their customery uses in networking and telecom, meaning that out of band management is performed without the use of the standard (we might say "data plane") network connection to a machine. But in practice they have drifted in meaning, and it is probably better to think of out-of-band management as meaning that the operating system and general-purpose components are not required. This might be made clearer by comparison: a very standard example of in-band management would be SSH, a service provided by the software on a computer that allows you to interact with it. Out-of-band management, by contrast, is provided by a dedicated hardware and software stack and does not require the operating system or, traditionally, even the CPU to cooperate.
You can imagine that this is a useful capability. Today, out-of-band management is probably best exemplified by the remote console that most servers offer. It's basically an embedded IP KVM, allowing you to interact with the machine as if you were at a locally connected monitor and keyboard. A lot of OOB management products also offer "virtual media," where you can upload an ISO file to the management interface and then have it appear to the computer proper as if it were a physical device. This is extremely useful for installing operating systems.
OOB management is an interesting little corner of computer history. It's not a new idea at all; in fact, similar capabilities can be found through pretty much the entire history of business computing. If anything, it's gotten simpler and more boring over time. A few evenings ago I was watching a clabretro video about an IBM p5 he's gotten working. As is the case in most of his videos about servers, he has to give a brief explanation of the multiple layers of lower-level management systems present in the p5 and their various textmode and web interfaces.
If we constrain our discussion of "servers" to relatively modern machines, starting say in the late '80s or early '90s, there are some common features:
- Some sort of local operator interface (this term itself being a very old one), like an LCD matrix display or grid of LED indicators, providing low-level information on hardware health.
- A serial console with access to the early bootloader and a persistent low-level management system.
- A higher-level management system, with a variable position in the stack depending on architecture, for remote management of the machine workload.
A lot of this stuff still hangs around today. Most servers can tell you on the front panel if a redundant component like a fan or power supply has failed, although the number of components that are redundant and can be replaced online has dwindled with time from "everything up to and including CPUs" on '90s prestige architectures to sometimes little more than fans. Serial management is still pretty common, mostly as a holdover of being a popular way to do OS installation and maintenance on headless machines [1].
But for the most part, OOB management has consolidated in the exact same way as processor architecture: onto Intel IPMI.
IPMI is confusing to some people for a couple of reasons. First, IPMI is a specification, not an implementation. Most major vendors have their own implementation of IPMI, often with features above and beyond the core IPMI spec, and they call them weird acronyms like HP iLO and Dell DRAC. These vendor-specific implementations often predate IPMI, too, so it's never quite right to say they are "just IPMI." They're independent systems with IPMI characteristics. On the other hand, more upstart manufacturers are more likely to just call it IPMI, in which case it may just be the standard offering from their firmware vendor.
Further confusing matters is a fair amount of terminological overlap. The IPMI software runs on a processor conventionally called the baseboard management controller or BMC, and the terms IPMI and BMC are sometimes used interchangeably. Lights-out management or LOM is mostly an obsolete term but sticks around because HP(E) is a fan of it and continues to call their IPMI implementation Integrated Lights-Out. The BMC should not be confused with the System Management Controller or SMC, which is one of a few terms used for a component present in client computers to handle tasks like fan speed control. These have an interrelated history and, indeed, the BMC handles those functions in most servers.
IPMI also specifies two interfaces: an out-of-band interface available over the network or a serial connection, and an in-band interface available to the operating system via a driver (and, in practice, I believe communication between the CPU and the baseboard management controller via the low-pin-count or LPC bus, which is a weird little holdover of ISA present in most modern computers). The result is that you can interact with the IPMI from a tool running in the operating system, like ipmitool on Linux. That makes it a little confusing what exactly is going on, if you don't understand that the IPMI is a completely independent system that has a local interface to the running operating system for convenience.
What does the IPMI actually do? Well, like most things, it's mostly become a webapp. Web interfaces are just too convenient to turn down, so while a lot of IPMI products do have dedicated client software, they're porting all the features into an embedded web application. The quality of these web interfaces varies widely but is mostly not very good. That raises a question, of course, of how you get to the IPMI web interface.
Most servers on the market have a dedicated ethernet interface for the IPMI, often labelled "IPMI" or "management" or something like that. Most people would agree that the best way to use IPMI is to put the management network interface onto a dedicated physical network, for reasons of both security and reliability (IPMI should remain accessible even in case of performance or reliability problems with your main network). A dedicated physical network costs time, space, and money, though, so there are compromises. For one, your "management network" is very likely to be a VLAN on your normal network equipment. That's sort of like what AT&T calls a common-carrier switching arrangement, meaning that it behaves like an independent, private network but shares all of the actual equipment with everything else, the isolation being implemented in software. That was a weird comparison to make and I probably just need to write a whole article on CCSAs like I've been meaning to.
Even that approach requires extra cabling, though, so IPMI offers "sideband" networking. With sideband management, the BMC communicates directly with the same NIC that the operating system uses. The implementation is a little bit weird: the NIC will pretend to be two different interfaces, mixing IPMI traffic into the same packet stream as host traffic but using a different MAC address. This way, it appears to other network equipment as if there are two different network interfaces in use, as usual. I will leave judgment as to how good of an idea this is to you, but there are obvious security considerations around reducing the segregation between IPMI and application traffic.
And yes, it should be said, a lot of IPMI implementations have proven to be security nightmares. They should never be accessible to any untrusted person.
Details of network features vary between IPMI implementations, but there is a standard interface on UDP 623 that can be used for discovery and basic commands. There's often SSH and a web interface, and VNC is pretty common for remote console.
There are some neat basic functions you can perform with the IPMI, either over the network or locally using an in-band IPMI client. A useful one, if you are forgetful and keep poor records like I do, is listing the hardware modules making up the machine at an FRU or vendor part number level. You can also interact with basic hardware functions like sensors, power state, fans, etc. IPMI offers a standard watchdog timer, which can be combined with software running on the operating system to ensure that the server will be reset if the application gets into an unhealthy state. You should set a long enough timeout to allow the system to boot and for you to connect and disable the watchdog timer, ask me how I know.
One of the reasons I thought to write about IPMI is its strange relationship to the world of everyday client computers. IPMI is very common in enterprise servers but very rare elsewhere, much to the consternation of people like me that don't have the space or noise tolerance for a 1U pizzabox in their homes. If you are trying to stick to compact or low-power computers, you'll pretty much have to go without.
But then, there's kind of a weird exception. What about Intel ME and AMD ST? These are essentially OOB management controllers that are present in virtually all Intel and AMD processors. This is kind of an odd story. Intel ME, the Management Engine, is an enabling component of Intel Active Management Technology (Intel AMT). AMT was pretty much an attempt at popularizing OOB management for client machines, and offers most of the same capabilities as IPMI. It has been considerably less successful. Most of that is probably due to pricing, Intel has limited almost all AMT features to use with their very costly enterprise management platforms. Perhaps there is some industry in which these sell well, but I am apparently not in it. There are open-source AMT clients, but the next problem you will run into is finding a machine where AMT is actually usable.
The fact that Intel AMT has sideband management capability, and that therefore the Intel ME component on which AMT runs has sideband management capability, was the topic of quite some consternation in the security community. Here is a mitigating factor: sideband management is only possible if the processor, motherboard chipset, and NIC are all AMT-capable. Options for all three devices are limited to Intel products with the vPro badge. The unpopularity of Intel NICs in consumer devices alone means that sideband access is rarely possible. vPro is also limited to relatively high-end processors and chipsets. The bad news is that you will have a hard time using AMT in your homelab, although some people certainly do. The upside is that the widely-reported "fact" that Intel ME is accessible via sideband networking on consumer devices is typically untrue, and for reasons beyond Intel software licensing.
That leaves an odd question around Intel ME itself, though, which is certainly OOB management-like but doesn't really have any OOB management features without AMT. So why do nearly all processors have it? Well, this is somewhat speculative, but the impression I get is that Intel ME exists mostly as a convenient way to host and manage trusted execution components that are used for things like Secure Boot and DRM. These features all run on the same processor as ME and share some common technology stack. The "management" portion of Intel ME is thus largely vestigial, and it's part of the secure computing infrastructure.
This is not to make excuses for Intel ME, which is entirely unauditable by third parties and has harbored significant security vulnerabilities in the past. But, remember, we all use one processor architecture from one of two vendors, so Intel doesn't have a whole lot of motivation to do better. Lest you respond that ARM is the way, remember that modern ARM SOCs used in consumer devices have pretty much identical capabilities.
It is what it is.
[1] The definition of "headless" is sticky and we have to not get stuck on it too much. People tend to say "headless" to mean no monitor and keyboard attached, but keep in mind that slide-out rack consoles and IP KVMs have been common for a long time and so in non-hyperscale environments truly headless machines are rarer than you would think. Part of this is because using a serial console is a monumental pain in the ass, so your typical computer operator will do a lot to avoid dealing with it. Before LCD displays, this meant a CRT and keyboard on an Anthro cart with wheels, but now that we are an enlightened society, you can cram a whole monitor and keyboard into 1U and get a KVM switching fabric that can cover the whole rack. Or swap cables. Mostly swap cables.