序列安全网:现代硬件上的高效并发控制
The Serial Safety Net: Efficient Concurrency Control on Modern Hardware

原始链接: http://muratbuffalo.blogspot.com/2026/03/the-serial-safety-net-efficient.html

## 串行安全网:在不损失性能的前提下实现串行化 本文介绍了一种名为串行安全网(SSN)的新方法,它可以在数据库中实现串行化事务,而不会出现传统方法(如两阶段锁定(2PL))的性能下降问题。现代系统通常优先考虑速度,采用较弱的隔离级别(读已提交或快照隔离),这可能导致数据异常,如写偏差或不可重复读。 SSN充当一个“认证器”,它构建在这些更快、较弱的方案*之上*。它跟踪事务依赖关系,并且只“批准”与其它事务可串行化的事务,从而保证执行历史无环。SSN使用时间戳来计算“水印”——本质上是检查事务的依赖关系是否在时间上形成循环。 虽然类似于乐观并发控制(OCC),但SSN避免了OCC的“重试血战”,因为它具有“安全重试”属性;冲突得以解决,因为发生冲突的事务在重试时已经提交。然而,SSN需要存储额外的时间戳,从而增加了元数据开销,并且不能原生处理幻影插入(需要额外的机制)。 本文认为,将SSN与快照隔离相结合可以提供最佳平衡,在事务执行*期间*提供应用程序一致性——避免因不一致读取而导致崩溃——而SSN则确保整体串行化。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 串行安全网:现代硬件上的高效并发控制 (muratbuffalo.blogspot.com) 5 分,由 ingve 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

This paper proposes a way to get serializability without completely destroying your system's performance. I quite like the paper, as it flips the script on how we think about database isolation levels. 

The Idea

In modern hardware setups (where we have massive multi-core processors, huge main memory, and I/O is no longer the main bottleneck), strict concurrency control schemes like Two-Phase Locking (2PL) choke the system due to contention on centralized structures. To keep things fast, most systems default to weaker schemes like Snapshot Isolation (SI) or Read Committed (RC) at the cost of allowing dependency cycles and data anomalies. Specifically, RC leaves your application vulnerable to unrepeatable reads as data shifts mid-flight, while SI famously opens the door to write skew, where two concurrent transactions update different halves of the same logical constraint.

Can we have our cake and eat it too? The paper introduces the Serial Safety Net (SSN), as a certifier that sits entirely on top of fast weak schemes like RC or SI, tracking the dependency graph and blessing a transaction only if it is serializable with respect to others.

Figure 1 shows the core value proposition of SSN. By layering SSN onto high-concurrency but weak schemes like RC or SI, the system eliminates all dependency cycles to achieve serializability without the performance hits seen in 2PL or Serializable Snapshot Isolation (SSI).

SSN implementation

When a transaction T tries to commit, SSN calculates a low watermark $\pi(T)$ (the oldest transaction in the future that depends on T) and a high watermark $\eta(T)$ (the newest transaction in the past that T depends on). If $\pi(T) \le \eta(T)$, it means the past has collided with the future, and a dependency cycle has closed. SSN aborts the transaction.

Because SSN throws out any transaction that forms a cycle, the final committed history is mathematically guaranteed to be cycle-free, and hence Serializable (SER).

Figure 2 illustrates how SSN detects serialization cycles using a serial-temporal graph. The x-axis represents the dependency order, while the y-axis tracks the global commit order. Forward dependency edges point upward, and backward edges (representing read anti-dependencies) point downward. Subfigures (a) and (b) illustrate a transaction cycle closing and the local exclusion window violation that triggers an abort: transaction T2 detects that its predecessor T1 committed after T2's oldest successor, $\pi(T2)$. This overlap proves T1 could also act as a successor, forming a forbidden loop.

Subfigures (c) and (d) demonstrate SSN's safe conditions and its conservative trade-offs. In (c), the exclusion window is satisfied because the predecessor T3 committed before the low watermark $\pi(Tx)$, making it impossible for T3 to loop back as a successor. Subfigure (d), however, shows a false positive where transaction T3 is aborted because its exclusion window is violated, even though no actual cycle exists yet. This strictness is necessary, though: allowing T3 to commit would be dangerous, as a future transaction could silently close the cycle later without triggering any further warnings. Since SSN summarizes complex graphs into just two numbers ($\pi$ and $\eta$), it will sometimes abort a transaction simply because the exclusion window was violated, even if a true cycle hasn't formed yet.

SSN vs. Pure OCC

Now, you might be asking: Wait, this sounds a lot like Optimistic Concurrency Control (OCC), so why not just use standard OCC for Serializability?

Yes, SSN is a form of optimistic certification, but the mechanisms are different, and the evaluation section of the paper argues why SSN is a superior architecture for high-contention workloads.

Standard OCC does validation by checking exact read/write set intersections. If someone overwrote your data, you abort. The problem is the OCC Retry Bloodbath! When standard OCC aborts a transaction, retrying it often throws it right back into the exact same conflict because the overwriting transaction might still be active. In the paper's evaluation, when transaction retries were enabled, the standard OCC prototype collapsed badly, wasting over 60% of its CPU cycles just fighting over index insertions.

SSN, however, possesses the "Safe Retry" property. If SSN aborts your transaction T because a predecessor U violated the exclusion window, U must have already committed. When you immediately retry, the conflict is physically in the past; your new transaction simply reads $U$'s freshly committed data, bypassing the conflict entirely. SSN's throughput stays stable under pressure while OCC falls over.


Discussion

So what do we have here? SSN offers a nice way to get to SER, while keeping decent concurrency. It proves that with a little bit of clever timestamp math, you can turn a dirty high-speed concurrency scheme into a serializable one.

Of course, no system is perfect. If you are going to deploy SSN, you have to pay the piper. Here are some critical trade-offs.

To track these dependencies, SSN requires you to store extra timestamps on every single version of a tuple in your database. In a massive in-memory system, this metadata bloat is a significant cost compared to leaner OCC implementations.

SSN is also not a standalone silver bullet for full serializability. While it is great at tracking row-level dependencies on existing records, it does not natively track phantoms (range-query insertions). Because an acyclic dependency graph only guarantees serializability in the absence of phantoms , you cannot just drop SSN onto vanilla RC or SI; you must actively extend the underlying CC scheme with separate mechanisms like index versioning or key-range locking to prevent them.

To bring closure on the SSN approach, let's address one final architectural puzzle. If you've been following the logic so far, you might have noticed a glaring question. The paper demonstrates that layering SSN on top of Read Committed guarantees serializability (RC + SSN = SER). It also shows that doing the exact same thing with Snapshot Isolation gets you to the exact same destination (SI + SSN = SER). If both combinations mathematically yield a serializable database, why would we ever willingly pay the higher performance overhead of Snapshot Isolation? Why would we want SI+SSN when we have RC+SSN at home?

While layering SSN on top of Read Committed (RC) guarantees a serializable outcome, it exposes your application to in-flight problems. Under RC, reads simply return the newest committed version of a record and never block. This means the underlying data can change right under your application's feet while the transaction is running. Your code might read Account A, and milliseconds later read Account B after a concurrent transfer committed, seeing a logically impossible total, an inconsistent snapshot. Even though SSN will ultimately catch this dependency cycle and safely abort the transaction during the pre-commit phase, your application logic might crash before it ever reaches that protective exit door. Furthermore, even if your code survives the run, this late abort mechanism hides a big performance penalty: your system might burn a lot of CPU and memory executing a complex doomed transaction, only for SSN to throw all that wasted work at the final commit check.

This is why we gladly pay the extra concurrency control overhead for SI. Under SI, each transaction reads from a perfectly consistent snapshot of the database taken at its start time. From your application's perspective, time stops, completely shielding your code from ever seeing those transiently broken states mid-flight. However, as we mentioned in the beginning, SI still allows write skews, and pairing it with SSN covers for that to guarantee serializability. 

If you like to dive into this more, the authors later published a 20 page journal version here. I also found a recent follow up by Japanese researchers here.

Here are the transactions related posts on my blog. And here are the transactions books I reviewed:

联系我们 contact @ memedata.com