超越 fork() + exec()
Moving beyond fork() + exec()

原始链接: https://lwn.net/SubscriberLink/1076018/16f01bbbb8e0d1f0/

Linux 内核开发者李晨(Li Chen)最近提出了“孵化模板”(spawn templates),旨在优化传统的 `fork()` 和 `exec()` 进程创建模式。虽然 `fork()` 在历史上被认为是优雅的,但由于它需要复制整个进程状态,而其中大部分工作随后又会被 `exec()` 丢弃,因此其计算成本很高。李晨的提案旨在通过允许应用程序将可执行配置缓存为模板来加速这一过程,从而降低频繁重复命令的设置成本。 尽管该提案显示出 2% 的性能提升,但内核维护者最终拒绝了其当前的形式。像 Mateusz Guzik 这样的审查者认为,业界需要完全摒弃 `fork()` 惯用法,转而创建“纯净”的进程。Christian Brauner 建议使用 `pidfd` 抽象采用替代方法——即创建一个空进程并通过新的系统调用(类似于 `fsconfig()`)进行配置。 李晨认同这一方向,将重心转向开发更稳健、原生的 `posix_spawn()` 实现。这一转变表明,虽然“孵化模板”不会被实现,但它成功催化了 Linux 迈向更简洁、更高效的进程创建 API 的进程。

所提供的文本探讨了围绕 Unix `fork()` 和 `exec()` 系统调用模式的争论,最近一篇 LWN 文章指出 `fork()` 是一种已经过时且不再实用的“黑客”手段,该观点引发了热议。 **主要论点包括:** * **历史背景:** `fork()` 设计于 20 世纪 70 年代,旨在满足当时内存极其有限的系统需求,允许程序将内存交换到磁盘并执行新代码。批评者认为,该模型现已成为一种负担,迫使操作系统设计陷入僵化且低效的模式(例如对写时复制和内存超额分配的过度依赖)。 * **现代性能问题:** 对于大型进程(如服务器应用程序或 Redis 等高内存负载程序),由于 `fork()` 必须遍历并复制庞大的页表(即使有写时复制优化),会导致显著的延迟峰值。 * **拟议替代方案:** 参与者建议转向“创建、配置、执行”(spawn, configure, exec)模型,即新进程在空状态下创建,从而避免复制父进程带来的开销。 * **反方观点:** 支持现状的人士强调,`fork()` 在概念上简单、优雅,且在 shell 管道处理和进程编排方面具有高度灵活性。许多人认为,目前所谓的“现代”替代方案往往更为复杂,且无法完全解决进程配置中的细微问题。
相关文章

原文

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

By Jonathan Corbet
June 5, 2026

Since the earliest days of Unix, two of the core process-oriented system calls have been fork(), which creates a child process as a copy of the parent, and exec(), which runs a new program in the place of the current one. In Linux kernels, those system calls are better known as clone() and execve(), but the core functionality remains the same. While there is elegance to this process-creation model, there are shortcomings as well. A recent proposal from Li Chen to add "spawn templates" to the kernel will not be accepted in its current form, but it may point the way toward a new process-creation primitive in the future.

fork() is a relatively expensive system call; it must copy the entire process state (including memory) for the child process. Many optimizations have been made over the years, but a fork is still a fundamentally costly operation. To make things worse, a fork() call is often immediately followed by an exec(), which will discard all of that memory that was so carefully copied for the child. Attempts (such as vfork()) have been made over the years to optimize for this case, but the pattern still is more expensive than it could be.

Spawn templates

Chen's patch set takes an interesting approach to optimize the fork() and exec() pattern. It is focused on applications that repeatedly launch processes running the same executable; imagine, for example, a program that must run Git repeatedly to obtain information about the contents of a repository. In such cases, the program could establish a template to accelerate those invocations, spreading the setup cost across multiple operations. This template would be created with the spawn_template_create() system call:

    struct spawn_template_create_args {
	__aligned_u64 flags;
	__s32 execfd;
	__u32 exec_flags;
	__aligned_u64 filename;
	/* Some fields elided */
    };

    int spawn_template_create(struct spawn_template_create_args *args, size_t args_size);

This call will return a file descriptor representing a template for the executable file, which can be specified as either a file descriptor (execfd) or an absolute path (filename), but not both. To create the template, the kernel will open the indicated file and cache a bunch of information that will allow a process to run that file more quickly in the future.

The application in question may run a given executable many times, but each invocation is different in a number of ways. The details of a specific invocation must be placed into an instance of this structure:

    struct spawn_template_spawn_args {
	__aligned_u64 flags;
	__aligned_u64 pidfd;
	__aligned_u64 argv;
	__aligned_u64 envp;
	__aligned_u64 actions;
	__aligned_u64 actions_len;
	__aligned_u64 reserved[4];
    };

The argv field is a pointer to the argument list to be passed to the program, while envp points to its environment. Changes to file descriptors and signal handling, instead, are passed through actions, which is a pointer to an array of:

    struct spawn_template_action {
	__u32 type;
	__u32 flags;
	__s32 fd;
	__s32 newfd;
	__aligned_u64 arg;
    };

If, for example, file descriptor four should be closed in the child, the associated spawn_template_action structure would have type set to SPAWN_TEMPLATE_ACTION_CLOSE and fd set to four. Other actions exist for duplicating file descriptors, opening files, changing the working directory, and changing signal handling.

Once the spawn_template_spawn_args structure has been filled in, the new process can be run with:

    int spawn_template_spawn(int template_fd,
    			     struct spawn_template_spawn_args *args, int args_size);

Internally, this system call follows something close to the normal fork()/exec() path. Chen is careful to point out that all of the normal checks applied when executing a new file remain in place. But the cached information in the template makes the whole process faster than it was before. How much faster? Benchmark results provided in the cover letter show an improvement of about 2%, which may not seem like a lot, but it may make a difference for applications that fit the expected pattern.

Toward posix_spawn()

The most detailed review of this work was posted by Mateusz Guzik, who said: "This problem is dear to my heart and I have been pondering it on and off for some time now. The entire fork + exec idiom is terrible and needs to be retired". He pointed out that the focus of the patch set was a bit strange in that it left the fork() part of the problem untouched. That is where most of the cost lies, he said, so optimization efforts should seek to remove it from the picture. Rather than copying the current process, "creating a pristine process is the way to go".

Christian Brauner was favorable toward the goal, saying: "The idea of having a builder api for exec isn't all that crazy". His suggestion, though, was that a new API should be built on top of the existing pidfd abstraction. Without getting into any degree of detail, he said that the right approach would be to create an option to pidfd_open() to create an empty process. A series of calls to a new pidfd_config() system call would then configure this new process as desired, setting up its environment, image to execute, and more. pidfd_config() would thus be analogous to fsconfig().

An important objective for a new interface, Brauner said, would be the ability to support an implementation of posix_spawn() in user space. posix_spawn() is well suited as a replacement for the fork()/exec() pattern; developers would likely welcome a native implementation that isn't (unlike the current implementation) hiding fork() and exec() under the covers. Chen agreed that the API as broadly sketched out by Brauner seemed better, and said that future work would be in that direction. So there will be no spawn templates in the Linux kernel but, if Chen's future work comes to fruition, Linux may finally gain a proper posix_spawn() implementation instead.




The LWN site is currently under high scraper load, so comment display has been suppressed for anonymous users. If you are a human, you may read the comments by clicking the button below:

Note: you can avoid this step in the future by logging into your LWN account.

联系我们 contact @ memedata.com