AWS工程师报告:Linux 7.0使PostgreSQL性能减半,修复可能不易。
AWS engineer reports PostgreSQL perf halved by Linux 7.0, fix may not be easy

原始链接: https://www.phoronix.com/news/Linux-7.0-AWS-PostgreSQL-Drop

即将发布的Linux 7.0内核中的性能下降导致PostgreSQL数据库吞吐量下降约一半,尤其是在Graviton4服务器上。问题源于限制内核抢占模式的更改,增加了在用户空间自旋锁中花费的时间。 虽然有人提出了一个补丁来撤销这些更改,但其被接受的可能性不大。内核开发者Peter Zijlstra建议,解决方案在于PostgreSQL适应使用Linux 7.0中也引入的“可重启序列”(RSEQ)时间片扩展。 这意味着PostgreSQL可能需要更新才能恢复之前内核中看到的性能水平。如果未解决,预计两周后发布的Linux 7.0——以及将为其提供支持的Ubuntu 26.04 LTS——可能会在数据库服务器更新之前,以明显降低的PostgreSQL性能发布。

最近的报告显示,PostgreSQL在Linux内核7.0上运行时性能显著下降,在某些情况下性能下降了一半。问题源于内核抢占处理方式的改变,且修复并不简单。 有人建议使用大页或通过`sysctl`禁用抢占来规避问题,但目前真正需要的还是内核补丁。人们担心在没有过渡期的情况下会破坏用户空间应用程序,并且其他应用程序也可能受到类似影响。 争论的焦点在于这是否构成“破坏”,还是仅仅是性能下降,一些人认为50%的性能损失是巨大的。负责该内核的工程师建议PostgreSQL适应新的抢占机制,但这一建议受到了PostgreSQL开发者的质疑。Linux 7.0计划在两周内发布,并将为Ubuntu 26.04 LTS提供支持,这意味着许多用户很快就会遇到这个问题。
相关文章

原文
An Amazon/AWS engineer raised the alarms on Friday over the current Linux 7.0 development kernel leading to the throughput for the PostgreSQL database server being around half that of prior kernel versions. The culprit halving the PostgreSQL performance is known but a revert looks like it may not happen and currently suggesting that PostgreSQL may need to be adapted.

Salvatore Dipietro of Amazon/AWS reported a throughput and latency regression for PostgreSQL. They found Linux 7.0 in its near-final form delivering around 0.51x the throughputof prior kernels on a Graviton4 server due to now much more time being spent in a user-space spinlock.

Bisecting the regression was traced back to the Linux 7.0 change of restricting the available preemption modes for the kernel. That change was previously covered on Phoronix within Linux 7.0 To Focus Just On Full & Lazy Preemption Models For Up-To-Date CPU Archs and in turn upstreamed with the Linux 7.0 scheduler updates.

As a result, yesterday posted to the Linux kernel mailing list was a patch to restore PREEMPT_NONE as the default given the severity of the reported regression.

pgbench regression benchmark


While fixing an active performance regression, it looks like this change to restore PREEMPT_NONE as the default preemption model might not be picked up. Peter Zijlstra who authored the original code simplifying the preemption modes has responded that the "fix" is to make PostgreSQL make use of the Restartable Sequences (RSEQ) time slice extension. That time slice extension support was also upstreamed for Linux 7.0.
"The fix here is to make PostgreSQL make use of rseq slice extension:

https://lkml.kernel.org/r/[email protected]

That should limit the exposure to lock holder preemption (unless PostgreSQL is doing seriously egregious things)."


So if that stands and shifting the blame to PostgreSQL, Linux 7.0 stable could lead to a significant drop for PostgreSQL performance in some scenarios until that popular database server is updated.

Linux 7.0 stable is due out in about two weeks. This is also the kernel version powering Ubuntu 26.04 LTS to be released later in April.

联系我们 contact @ memedata.com