在QEMU中启动Linux,并用Go编写PID 1程序,以说明内核作为程序。
Booting Linux in QEMU and Writing PID 1 in Go to Illustrate Kernel as Program

原始链接: https://serversfor.dev/linux-inside-out/the-linux-kernel-is-just-a-program/

## 揭秘 Linux 内核 大多数 Linux 入门教程侧重于 shell 命令,将内核视为隐藏组件。本次探索旨在改变这种状况,展示内核只是一个可编译和可运行的二进制文件。 内核充当应用程序和硬件(CPU、内存、设备)之间的抽象层,提供统一、安全的接口。它管理资源、控制访问,并提供防火墙和文件系统等功能,作为计算机的“运行时”。你通常可以在 `/boot` 目录下找到它,通常命名为 `vmlinuz-*`。 实验涉及使用 QEMU 直接启动此内核。最初的尝试会导致“内核崩溃”,因为内核需要一个根文件系统才能运行。这可以通过创建一个最小的 `initramfs` 来解决——一个包含基本“init”程序(此处使用 Go 创建)的临时文件系统,用于启动系统。 成功使用 `initramfs` 启动,证明了一个功能完备但简单的 Linux 发行版。关键收获:内核是一个文件,发行版是内核加上程序,进程(例如我们的 PID 为 1 的 Go 程序)在“内核空间”或“用户空间”中运行。这种动手实践的方法能够建立对 Linux 系统如何运作的基础理解。

## 在 QEMU 中启动 Linux & 内核作为程序 - 摘要 一篇最近的 Hacker News 文章详细描述了一个在 QEMU 中启动最小 Linux 系统的实验,其核心思想是将内核仅仅展示为另一个程序。作者构建了一个基本系统,其中一个 Go 程序充当初始进程(PID 1),强调了即使是复杂的操作系统也可以被理解为一系列与硬件交互的程序。 讨论的中心是 `initramfs` 的作用,它是在启动期间加载到内存中的文件系统,处理加载驱动程序和挂载根文件系统等任务——尽管对于最小化设置来说它*不是必需的*。用户分享了构建自定义内核的经验(例如使用 Gentoo),以及理解低级系统组件的好处。 作者计划创建一系列文章,以揭示 Linux 内部原理,面向开发者,重点介绍系统调用、文件作为资源以及用户/组权限等概念。他们选择 Go 是因为它易于上手,同时仍然允许与操作系统密切交互,与 C 的复杂性形成对比。目标是弥合简单*使用* Linux 与理解其底层机制之间的差距。
相关文章

原文

Most books and courses introduce Linux through shell commands, leaving the kernel as a mysterious black box doing magic behind the scenes.

In this post, we will run some experiments to demystify it: the Linux kernel is just a binary that you can build and run.

The experiments are designed so you can follow along if you have a Linux PC. But this is completely optional, the goal is to build a mental model about how Linux works, seeing how components of the system fit together.

But first let’s talk about what a kernel is.

What is a kernel?

Computers are built from CPUs, memory, and other devices, like video cards, network cards, keyboards, displays, and a lot of other stuff.

These devices can be manufactured by different companies, have different capabilities, and can be programmed differently.

An operating system kernel provides an abstraction to use these devices and resources conveniently and securely. Without one, writing programs would be much more difficult. We would need to write the low-level code to use every device that our program needs, and it’s likely that it wouldn’t work on other computers.

A kernel

  • gives us APIs to interact with the hardware over a unified interface
  • manages how programs can use the computer’s CPU, memory and other resources
  • provides access control over what resources can a program access
  • provides additional features like firewalls, file systems, mechanisms for programs to communicate, etc.

The closest analogy from the software development world is that the kernel is a runtime for our computer.

Where is the kernel?

On most Linux distributions we will find the kernel under the /boot directory. Let’s enter the directory and list its contents:

~$ cd /boot
/boot$ ls -1
System.map-6.12.43+deb13-amd64
System.map-6.12.48+deb13-amd64
config-6.12.43+deb13-amd64
config-6.12.48+deb13-amd64
efi
grub
initrd.img-6.12.43+deb13-amd64
initrd.img-6.12.48+deb13-amd64
vmlinuz-6.12.43+deb13-amd64
vmlinuz-6.12.48+deb13-amd64

We see a few files here, but the one we are looking for is vmlinuz-6.12.48+deb13-amd64. This single file is the kernel.

If you ever wondered what this name means:

  • vmlinuz: vm for virtual memory, linux, and z indicating compression
  • 6.12.48+deb13: this is the kernel version, and the distribution (Debian 13)
  • amd64: this is the architecture of our system

Note: Different distributions may use slightly different naming conventions. vmlinuz is commonly the bootable compressed kernel image.

Let’s start the kernel

In our first experiment we will copy this kernel into another directory and run it.

First, let’s create a directory and copy the kernel there.

Note: Your kernel version might differ, remember to check it before the cp command.

/boot$ cd
~$ mkdir linux-inside-out
~$ cd linux-inside-out/
~/linux-inside-out$ cp /boot/vmlinuz-6.12.48+deb13-amd64 .
~/linux-inside-out$ ls -lh
total 12M
-rw-r--r-- 1 zsoltkacsandi zsoltkacsandi 12M Dec  1 09:44 vmlinuz-6.12.48+deb13-amd64

Then install some tools that are needed for this experiment.

We will use QEMU, a virtual machine emulator, because our kernel needs something that works like a computer, and because we do not want to mess up our original operating system.

~$ sudo apt update
~$ sudo apt install -y qemu-system-x86 qemu-utils

Then start a virtual machine with our kernel:

~/linux-inside-out$ qemu-system-x86_64 \
  -m 256M \
  -kernel vmlinuz-6.12.48+deb13-amd64 \
  -append "console=ttyS0" \
  -nographic

The output should be something like this:

SeaBIOS (version 1.16.3-debian-1.16.3-2)
iPXE (https://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+06FC6D30+06F06D30 CA00
Booting from ROM...
Probing EDD (edd=off to disable)... o
[    0.000000] Linux version 6.12.48+deb13-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, )
[    0.000000] Command line: console=ttyS0
...
[    2.055627] RAS: Correctable Errors collector initialized.
[    2.161843] clk: Disabling unused clocks
[    2.162218] PM: genpd: Disabling unused power domains
[    2.179652] /dev/root: Can't open blockdev
[    2.180871] VFS: Cannot open root device "" or unknown-block(0,0): error -6
[    2.181038] Please append a correct "root=" boot option; here are the available partitions:
[    2.181368] List of all bdev filesystems:
[    2.181477]  fuseblk
[    2.181516]
[    2.181875] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    2.182495] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.48+deb13-amd64 #1  Debian 6.12.48-1
[    2.182802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
...
[    2.186426] Kernel Offset: 0x30e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    2.186949] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

You can exit by pressing Ctrl + A then X.

So, we’ve just started the same kernel that is running on our computer. It took 2 seconds, it printed a lot of log messages, then panicked.

This panic is not a bug, actually this is expected - once our kernel initializes itself, it tries to mount the root filesystem, and hand over control to a program called init.

So let’s give one to it.

We will write a simple program that we will use as an init program.

We will use Golang for two reasons:

  • it has an easy to learn syntax, that readers coming from different backgrounds can pick up and understand quickly
  • it can build a statically-linked binary with no C dependencies, making it portable and perfect for our minimal experiment

First let’s install Golang, and create a new project called init:

~/linux-inside-out$ sudo apt -y install golang
~/linux-inside-out$ mkdir init
~/linux-inside-out$ cd init
~/linux-inside-out/init$
~/linux-inside-out/init$ go mod init init
go: creating new go.mod: module init
go: to add module requirements and sums:
go mod tidy

Create a new file, called main.go:

package main

import (
    "fmt"
    "os"
    "time"
)

func main() {
    fmt.Println("Hello from Go init!")
    fmt.Println("PID:", os.Getpid()) // printing the PID (process ID)

    for i := 0; ; i++ { // every two seconds printing the text "tick {tick number}"
        fmt.Println("tick", i)
        time.Sleep(2 * time.Second)
    }
}

Then build the program and run it:

~/linux-inside-out/init$ CGO_ENABLED=0 go build -o init .
~/linux-inside-out/init$ ./init
Hello from Go init!
PID: 3086
tick 0
tick 1

Press Ctrl + C to stop it.

As you can see, this is a regular program that got the PID 3086 and prints some text to the output. There is nothing special about it.

Note: If you use another programming language for this experiment, you will need to compile a statically-linked binary. Without that the following parts of the experiment will not work.

Now we create a simple initramfs filesystem. When the kernel starts it does not have all of the parts loaded that are needed to access the disks in the computer, so it needs a filesystem loaded into the memory called initramfs (Initial RAM filesystem).

~/linux-inside-out$ mkdir -p rootfs/{proc,sys,dev}
~/linux-inside-out$ cp ./init/init rootfs/init
~/linux-inside-out$ sudo mknod rootfs/dev/console c 5 1
~/linux-inside-out$ sudo mknod rootfs/dev/null c 1 3

The cp and mkdir commands might be familiar. The mknod command creates special files that programs use to communicate with hardware devices.

Our root filesystem directory structure looks like this:

|-- dev             # dev (devices) directory
|   |-- console     # console device
|   `-- null        # null device
|-- init            # our Golang program
|-- proc            # a directory called proc
`-- sys             # a directory called sys

Now let’s package the files into an archive file, called initramfs.img.

( cd rootfs && find . | cpio -H newc -o ) > initramfs.img

Then start a virtual machine again, with the kernel and initramfs:

qemu-system-x86_64 \
  -m 256M \
  -kernel vmlinuz-6.12.48+deb13-amd64 \
  -initrd initramfs.img \
  -append "console=ttyS0 rdinit=/init" \
  -nographic
SeaBIOS (version 1.16.3-debian-1.16.3-2)
iPXE (https://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+0EFC6D30+0EF06D30 CA00
Booting from ROM...
Probing EDD (edd=off to disable)... o
[    0.000000] Linux version 6.12.48+deb13-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PR)
[    0.000000] Command line: console=ttyS0 rdinit=/init
...
[    1.922229] RAS: Correctable Errors collector initialized.
[    2.158525] clk: Disabling unused clocks
[    2.158865] PM: genpd: Disabling unused power domains
[    2.264545] Freeing unused decrypted memory: 2028K
[    2.327128] Freeing unused kernel image (initmem) memory: 4148K
[    2.406015] Write protecting the kernel read-only data: 28672k
[    2.407968] Freeing unused kernel image (rodata/data gap) memory: 488K
[    2.555150] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    2.557822] tsc: Refined TSC clocksource calibration: 2903.977 MHz
[    2.558399] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x29dbf0142be, max_idle_ns: 440795300983 ns
[    2.565700] clocksource: Switched to clocksource tsc
[    2.672446] Run /init as init process
Hello from Go init!
PID: 1
tick 0
tick 1
tick 2

Our kernel booted normally, then it started our Go program, the init process. A program that is running is called process.

There are a few important points to note here:

  • Our Go program got the process ID 1 (PID: 1). PID 1 is the first process to start, it is called the init process. The purpose of the init process is to start the other programs that need to be running for the operating system.
  • Up until the Run /init as init process line we are in the kernel space. With the init process starting we are entering into the user space.
  • We have just built a (rather simple) Linux distribution. Two files, that’s it. A Linux distribution is really just a Linux kernel, a bunch of programs and config files packaged together.

What we have learnt so far

We have already learnt quite a few important concepts that are essential to understand Linux systems:

  • The Linux kernel is a single, few megabytes file, sitting on your disk
  • A Linux distribution is just a kernel and a set of other programs and config files
  • A process is a program that is under execution
  • PID is the process ID
  • What the init process is
  • We familiarized ourselves with the concepts of kernel space and user space
联系我们 contact @ memedata.com