如何干净地停止Linux线程
How to stop Linux threads cleanly

原始链接: https://mazzo.li/posts/stopping-linux-threads.html

## Linux下优雅地停止线程:总结 在长期运行的Linux应用程序(如数据库或服务器)中管理线程是一个挑战:干净地停止它们。简单地终止线程可能会导致资源泄漏——未释放的内存、持有的锁等。虽然启动线程很容易,但优雅地关闭却并非易事。 像忙等待循环(重复检查“停止”标志)这样简单的解决方案,如果线程任务是短期的,则可行。但是,阻塞操作(如网络调用)会使事情复杂化。信号是一种主要的中断机制,但通过信号取消线程可能是不安全的,可能会中断关键部分并导致资源问题。现代C++进一步增加了复杂性,强制展开可能会导致崩溃。 理想情况下,你希望受控的取消——仅在阻塞系统调用周围启用它。然而,即使这样也不是对*所有*系统调用都万无一失的。最近的Linux内核特性,如`rseq`,提供了一种在系统调用*之前*原子地检查停止标志的方法,提供了一个更强大的解决方案,尽管它很复杂,需要内联汇编。 最终,干净地停止线程通常需要仔细的设计,避免长时间的关键部分,并可能将不受信任的代码隔离在单独的进程中,以确保资源清理。没有一个完美的解决方案,这凸显了标准线程管理设施的不足。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 如何干净地停止 Linux 线程 (mazzo.li) 11 分,由 signa11 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 考虑申请 YC 的 2026 年冬季批次!申请截止日期为 11 月 10 日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:
相关文章

原文

Let’s say you’re writing a long running multi-threaded application, on Linux. Maybe it’s a database or a server of some sort. Let’s also imagine that you’re not running on some managed runtime (maybe the JVM, Go, or BEAM), but rather managing threads spawned using the clone syscall. Think of threads created in C with pthread_create, or using C++’s std::thread.

Once you get into the business of starting threads, you’re probably also in the business of stopping them. However the former is much easier than the latter. With “stopping” I mean stopping the thread while giving it a chance to run some cleanup operations before fully terminating. Or in other words, we want to terminate a thread while ensuring that memory is freed, locks are released, logs are flushed, and so on.

This task is sadly not as straightforward as it should be, and there definitely isn’t a one-size-fits-all solution. This blog post aims to give an overview of the problem space and to highlight some pitfalls in an area with no shortage, and present a little magic trick at the end.

(Quasi-)busy looping #

If you can afford it, you can structure each thread as such:

pthread_join or equivalents to ensure that the thread has actually terminated.

Here’s a contrived but working example in C++:

#

Quasi-busy loops are all well and good, but they’re sometimes not desirable. The most common roadblock is foreign code that we don’t control which does not fit this pattern – think of a third-party library doing some blocking network call.

As we’ll see later, there’s essentially no clean way to stop a thread running code we don’t control, but there are other reasons to not want to write all our code with the quasi-busy loop pattern.

If we have many threads even relatively slow timeouts might cause significant scheduling overhead due to spurious wakeups, especially on an already busy system. The timeouts will also make debugging and inspecting the system considerably more annoying (e.g. imagine what the output of strace would look like).

So it is worth thinking about how to stop a thread while it is blocked on a syscall. The most straightforward way to do that is through signals.

We need to talk about signals #

Signals are the main way to interrupt execution of a thread without explicit coordination of the interrupted thread, and are therefore very relevant to the topic of this blog post. They’re also a bit of a mess. These two facts generate unhappiness.

For a good overview on signals I recommend the surprisingly informative man page, but I’ll give a sufficient overview here. If you already know how signals work, you can skip to the next section.

Signals can arise because of some hardware exception or be initiated by software. The most familiar instance of a software-initiated signal is your shell sending SIGINT to the foreground process when you press ctrl-c. All signals initiated by software originate from a handful of syscalls – for instance pthread_kill will send a signal to a thread.

Hardware initiated signals are generally handled immediately, while software initiated signals are handled when a CPU is about to re-enter user mode after the kernel has done some work. In any event, when a signal needs to handled in a given thread:

  1. If the signal has been blocked by the receiving thread, it’ll wait to be handled until it is unblocked;

  2. If the signal is not blocked, it might be:

    1. ignored;
    2. handled in the “default” manner;
    3. handled using some custom signal handler.

Which signals are blocked is controlled by modifying the signal mask using sigprocmask/pthread_sigmask, and which action is taken if the thread is not blocked is controlled by sigaction.

Assuming that the signal is not blocked, paths 2.a and 2.b will be managed entirely by the kernel, while path 2.c will cause the kernel to pass control to a user-space signal handler which will do something with the signal.

Importantly, if some thread is in a syscall (for instance blocked while reading from a socket), and a signal needs to be handled, the syscall will return early with error code EINTR after the signal handler has run.

The signal handler code is subject to various constraints, but otherwise it can do as it pleases, including deciding to not give back control to the code that was executing before. By default, most signals just cause the program to stop abruptly, possibly with a core dump. In the next few sections we’re going to explore various ways to use signals to stop our threads.

Thread cancellation, a false hope #

Let’s first examine a way to stop threads, implemented through signals, which would seem to do exactly what we want: thread cancellation.

The API for thread cancellation is very promising. pthread_cancel(tid) will “cancel” thread tid. The way pthread_cancel works boils down to:

  1. A special signal is sent to thread tid;
  2. The libc you’re using (say glibc or musl) sets up a handler so that when the cancel signal is received the thread winds down.

There are additional details, but that’s essentially all there is to it. However, trouble lies ahead.

Resource management + thread cancellation = 😢 #

Recall that signals can essentially arise anywhere in your code. So if we have code such as

Thread cancellation is incompatible with modern C++ #

If you’re a C++/Rust programmer, you might have sneered at the explicit locking above – you’ve got RAII to handle such cases:

we’re always liable to a cancellation happening in a noexcept block, which will cause your program to crash via std::terminate.

So since C++11, and especially since C++14 where destructors are marked as noexcept by default, thread cancellation is essentially useless in C++.

Forced unwinding is unsafe anyway #

You can’t cleanly stop threads running code you don’t control #

As a brief aside, the nature of signals (and by extension thread cancellation) implies that it’s impossible to cleanly stop code that you don’t control. You cannot guarantee that memory isn’t leaked, files are closed, global locks are released, and so on.

If you need to interrupt foreign code reliably, it’s better to isolate it in its own process. It might still leak temporary files and other such persistent resources, but most relevant state would be cleaned up by the operating system when the process dies.

Controlled thread cancellation #

Hopefully you’re now convinced that unrestricted thread cancellation is not a great idea in most circumstances. However we can pick the circumstances explicitly by enabling thread cancellation only at specific times. So our event loop becomes:

#

However once we’ve done this, it might be worth getting rid of thread cancellation entirely. Relying on the stack unwinding to free resources would not be portable to alternative libcs, and we’d need to be fairly careful if we wanted to perform some explicit cleanup actions outside destructors.

So instead we can work with signals directly. We can pick SIGUSR1 as our “stopping” signal, install a handler which sets our stopping variable, and check the variable before doing blocking syscalls.

Here’s a worked out example in C++. The interesting parts of the code are setting up the signal handler:

Another approach to this problem would be to have USR1 blocked normally, and unblock it only when the syscall runs, similarly to what we did with the temporary thread cancellation. If the syscall terminates with EINTR, we know that we should quit.

Sadly the race is still there, just between the unblocking and running the syscall:

#

However, there often is an easy to atomically change the sigmask and run a syscall:

  • select/poll/epoll_wait have pselect/ppoll/epoll_pwait variants which take a sigmask argument;
  • read/write and similar syscalls can be replaced by their non-blocking versions and a blocking ppoll;
  • To sleep one can use timerfd or just ppoll with no file descriptors but with a timeout;
  • The newly added io_uring_enter supports this use case out of the box.

The syscalls above already cover a very large footprint.

In this style, the receive loop of the program becomes:

#

Sadly, not all syscalls have variants which let us atomically change the sigmask as they execute. futex, the main syscall used to implement userspace concurrency primitives, is a notable example of a syscall which does not include such a facility.

In the case of futex one can interrupt threads through FUTEX_WAKE, but it turns out we can setup a mechanism to safely check the boolean stop flag atomically with starting any syscall.

To recap, the problematic code looks like this:

rseq (“restartable sequences”), which lets us achieve this, although with some effort. The rseq machinery works as follows:

  • You write some code which you want to run atomically with regards to preemption or signals – the critical section.

  • Before the critical section is entered, we inform the kernel that the critical section is about to run by writing to a bit of memory shared between the kernel and userspace.

  • This bit of memory contains:

    1. start_ip, the instruction pointer which marks the begin of the critical section;
    2. post_commit_offset, the length of the critical section;
    3. abort_ip, the instruction pointer to jump to if the kernel needs to preempt the critical section.
  • If the kernel has preempted a thread, or if a signal needs to be delivered to the thread, it checks if the thread is in a rseq critical section, and if it does sets the program counter for the thread to abort_ip.

The process above forces the critical section to be a single contiguous block (from start_ip to start_ip+post_commit_offset) which we must know the address of. These requirements force us to write it in inline assembly.

Note that rather than disabling preemption entirely, rseq lets us specify some code (the code starting at abort_ip) to perform some cleanup if the critical section is interrupted. The correct functioning of the critical section therefore often depends on a “commit instruction” at the very end of the critical section which makes the changes in the critical section visible.

In our case the “commit instruction” is syscall – the instruction which will invoke the syscall that we’re interested in.

Which leads us to the following x86-64 widget for a 6-argument syscall stub which atomically checks a stop flag and executes a syscall:

recently added support for rseq, which provides a __rseq_offset variable containing the offset where the critical section information lives, relative to the thread pointer. All we need to do in the critical section is check the flag, skip the syscall if it’s set, and perform the syscall if it is. If the flag is set we pretend the syscall has failed with EINTR.

You can find the full code for the previous example using this trick to call recvfrom here. I’m not necessarily advocating the use of this technique, but it’s definitely an interesting curiosity.

Wrapping up #

It’s quite frustrating that there’s no agreed upon way to interrupt and stack unwind a Linux thread and to protect critical sections from such unwinding. There are no technical obstacles to such facilities existing, but clean teardown is often a neglected part of software.

Haskell is one language where these capabilities do exist in the form of asynchronous exceptions, although one still needs to be careful to protect critical sections appropriately.

Acknowledgements #

Peter Cawley provided input on many details treated in this blog post and read its draft. He also suggested rseq as a possible solution. Many thanks also go to Niklas Hambüchen, Alexandru Sçvortov, Alex Sayers, and Alex Appetiti for reading drafts of this blog post.

联系我们 contact @ memedata.com