IPC 混合:消息队列窥探、io_uring 和 bus1
IPC medley: message-queue peeking, io_uring, and bus1

原始链接: https://lwn.net/Articles/1065490/

## Linux内核开发:进程间通信更新 - LWN 摘要 本文LWN报道了Linux内核中改进进程间通信(IPC)的几项持续努力。目前,现有的IPC方法往往无法满足用户需求,因此出现了新的提案。 其中一项致力于增强POSIX消息队列,通过`mq_timedreceive2()`,一个新的系统调用,旨在增加诸如在不移除消息的情况下查看消息,以及按索引选择特定消息等功能。然而,这需要解决架构对系统调用参数施加的限制。 另一项提案在`io_uring`子系统中引入了一种新的IPC方案,目标是实现与D-Bus相似的高带宽,利用共享环形缓冲区。虽然前景可观,但仍处于早期阶段,需要大量开发,包括解决完整性和LLM辅助代码生成方面的问题。 最后,“bus1”,一个最初于2016年提出的内核介导的IPC子系统,已经重新出现,现在用Rust重写。它侧重于消息和能力传递,但目前优先巩固Rust集成,然后再进行更广泛的内核暴露。 这些发展表明,为了满足从监控工具到高级检查点/恢复系统等多样化的需求,Linux内核在完善和扩展IPC选项方面持续努力。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 IPC 混合:消息队列窥探、io_uring 和 bus1 (lwn.net) 6 分,by signa11 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文
Ready to give LWN a try?

With a subscription to LWN, you can stay current with what is happening in the Linux and free-software community and take advantage of subscriber-only site features. We are pleased to offer you a free trial subscription, no credit card required, so that you can see for yourself. Please, join us!

By Jonathan Corbet
April 2, 2026

The kernel provides a number of ways for processes to communicate with each other, but they never quite seem to fit the bill for many users. There are currently a few proposals for interprocess communication (IPC) enhancements circulating on the mailing lists. The most straightforward one adds a new system call for POSIX message queues that enables the addition of new features. For those wanting an entirely new way to do interprocess communication, there is a proposal to add a new subsystem for that purpose to io_uring. Finally, the bus1 proposal has made a return after ten years.

Peeking at message queues

The POSIX message-queue API is not heavily used, but there are users out there who care about how well it works. Message queues are named objects that, by default, all share a global namespace, though IPC namespaces can be used to separate them. There is a whole set of system calls for the creation, configuration, use, and destruction of message queues; see the mq_overview man page for an introduction to this subsystem.

Of interest here is mq_timedreceive(), which can be used to receive messages from a message queue:

    ssize_t mq_timedreceive(size_t msg_len;
                            mqd_t mqdes, char  msg_ptr[msg_len],
                            size_t msg_len, unsigned int *msg_prio,
                            const struct timespec  abs_timeout);

This call will receive the highest-priority message pending in the queue described by mqdes (which is a file descriptor on Linux systems) into the buffer pointed to by msg_ptr, which must be at least msg_len bytes in length. If abs_timeout is not null, it specifies how long the call should block before returning a timeout error. On successful receipt of a message, the location pointed to by msg_prio (if non-null) will be set to the priority of the received message.

That system call has a fair number of parameters, but Mathura Kumar would like to add some more. Since mq_timedreceive() was not designed for extensibility, that means adding a new system call. Thus, Kumar's patch set adding mq_timedreceive2(). But there is an additional constraint here: there are architecture-imposed limits on the number of arguments that can be passed to system calls, and Kumar's plans would exceed those limits. As a result, the new system call is defined as:

    struct mq_timedreceive2_args {
        size_t         msg_len;
        unsigned int  *msg_prio;
        char          *msg_ptr;
    };

    ssize_t mq_timedreceive2(mqd_t mqdes,
                             struct mq_timedreceive2_args *uargs,
                             unsigned int flags,
                             unsigned long index,
                             const struct timespec *abs_timeout);

The msg_len, msg_prio, and msg_ptr arguments have been moved into the new mq_timedreceive2_args structure, freeing up two slots for new parameters to the system call. That structure is passed by pointer, without using the common pattern of passing its length, which would make future additions easier; that may change if this patch series moves forward.

The new arguments are flags and index. In this series, only one flag (MQ_PEEK) is defined; if it is present, the message will be returned as usual, but without removing it from the queue, meaning that it will still be there the next time a receive operation is performed. The index argument indicates which message is of interest; a value of zero will return the highest-priority message, and higher values will return messages further back in the queue.

There are a few use cases for these features described in the patch cover letter. One would be monitoring tools, which may want to look at the message traffic without interfering with it. Another one is Checkpoint/Restore in Userspace, which can read a series of messages out of a queue, then restore them with the rest of the process at a future time.

The series as a whole has not received much attention so far, which is perhaps unsurprising given that few developers have much interest in POSIX message queues. If this work is to proceed, it will need to attract some reviews, and probably go through some more rounds to address the problems that are found.

IPC in io_uring

Since its inception, the io_uring subsystem has steadily gained functionality. After having started as the asynchronous I/O mechanism that Linux has long lacked, it has evolved into a separate system-call interface providing access to increasing amounts of kernel functionality. While io_uring can be used for interprocess communication (by way of Unix-domain sockets, for example), it has not yet acquired its own IPC scheme. This patch series from Daniel Hodges seeks to change that situation, but it probably needs a fair amount of work to get there.

Hodges's goal is to provide a high-bandwidth IPC mechanism, similar to D-Bus, that will perform well on large systems. By using shared ring buffers, processes should be able to communicate with minimal copying of data. It is worth noting that other developers have attempted to solve this problem over the years, generally without success; see, for example, the sad story of kdbus. Hope springs eternal, though, and perhaps io_uring is the platform upon which a successful solution can be built.

There are facilities for direct and broadcast messages. Communication is done through "channels"; it all starts when one process issues at least one IORING_REGISTER_IPC_CHANNEL_CREATE operation to establish an open channel. Other processes can attach to existing channels if the permissions allow. Two basic operations, IORING_OP_IPC_SEND and IORING_OP_IPC_RECV, are used to send and receive messages, respectively. There is no documentation, naturally, but interested readers can look at this patch containing a set of self-tests that exercise the new features.

The io_uring maintainer, Jens Axboe, quickly noticed that the patch showed signs of LLM-assisted creation, something that Hodges owned up to. He also noted that the series falls short of being a complete D-Bus replacement, lacking features like credential management. Still Axboe agreed that an IPC feature for io_uring "makes sense to do" and seemed happy with the overall design of the code. Some questions he asked though, went unanswered. For this work to proceed, Hodges will need to return and do the hard work to bring a proof-of-concept patch up to the level needed for integration into a core subsystem like io_uring.

Bus1 returns

Back in 2016, David Herrmann Rheinsberg proposed a new kernel subsystem called "bus1", which would provide kernel-mediated interprocess communication along the lines of D-Bus. It allowed the passing of messages, but also of capabilities, represented by bus1 handles and open file descriptors. The proposal attracted some attention, and brought some interesting ideas (see the above-linked article for details), but stalled fairly quickly and was never seriously considered for merging into the mainline kernel.

Ten years later, bus1 is back, posted this time by David Rheinsberg. The code has seen a few changes in the intervening decade:

The biggest change is that we stripped everything down to the basics and reimplemented the module in Rust. It is a delight not having to worry about refcount ownership and object lifetimes, but at the cost of a C<->Rust bridge that brings some challenges.

The core features of bus1 remain similar to what was proposed in 2016. For the time being, Rheinsberg is focusing on the Rust aspects of the work and requesting help from the Rust for Linux community to get that integration into better shape.

At some future time, presumably, the new bus1 implementation will be more widely exposed within the kernel community, at which point we will see if there is an appetite for this kind of in-kernel IPC mechanism or not. For those who would like an early look, this patch contains documentation on how the bus1 API will work, though with a number of details left unspecified.

[Editor's note: we originally missed that David had changed his name. Apologies for the error.]



联系我们 contact @ memedata.com