形式化方法只能解决我一半的问题
Formal methods only solve half my problems

原始链接: https://brooker.co.za/blog/2022/06/02/formal.html

## 分布式系统设计的差距弥合 虽然像TLA+和P这样的形式化方法对于验证分布式系统的关键方面很有价值,但它们目前的行业应用主要集中在安全性和活性上——防止数据损坏等错误。然而,这些工具对于*性能*相关的关键设计问题——延迟、成本、可扩展性、硬件需求以及负载下的行为——仍然很大程度上没有答案。 目前,设计者依赖于昂贵的原型设计、复杂的封闭形式建模以及敏感的模拟(通常是蒙特卡洛模拟)来解决这些性能问题。每种方法都有局限性:原型设计速度慢,建模可能过于简化,而模拟需要大量工作并且容易受到建模假设偏差的影响。 作者提倡使用能够结合形式化建模的严谨性(如Pluscal或P)以及定量分析系统性能能力的工具。理想情况下,这些工具应该能够整合真实世界的数据并促进敏感性分析,揭示系统属性随着不同输入和假设的变化情况。这将能够实现数据驱动的基础设施投资和主动识别设计缺陷。 最终,需要转向定量设计——超越仅仅是正确性——摆脱基于经验的方法,并利用广泛适用的工具来实现更敏捷和可靠的设计流程。

## 黑客新闻讨论摘要:形式化方法与系统性能 最近一篇黑客新闻的讨论,源于一篇博客文章([brooker.co.za](https://brooker.co.za)),集中讨论了形式化方法的局限性。虽然像TLA+这样的工具可以*证明*系统的正确性,但它们无法保证*性能*——这对于现实世界的大规模分布式系统至关重要。 普遍的共识是,由于成本和过度配置,实现绝对的性能保证通常是不切实际的。相反,工程师专注于统计上可接受的性能、优雅的降级以及从意外事件中稳健恢复。关键策略包括负载均衡、自动伸缩和适当的反压机制。 许多评论者强调需要理解系统在各种负载下的行为,识别饱和点和排队行为。静态分析被认为是不够的;模拟和测试是首选。一些人指出“资源感知”类型系统(如资源感知机器学习)的最新研究,作为一种潜在的未来解决方案,但硬件级别的非确定性构成了一个重大挑战。 最终,讨论强调功能正确性仅仅是*起点*;性能和可扩展性同样至关重要,而目前用于推理它们的工具还不够完善。
相关文章

原文

At most half my problems. I have a lot of problems.

The following is a one-page summary I wrote as a submission to HPTS’22. Hopefully it’s of broader interest.

Formal methods, like TLA+ and P, have proven to be extremely valuable to the builders of large scale distributed systems1, and to researchers working on distributed protocols. In industry, these tools typically aren’t used for full verification. Instead, effort is focused on interactions and protocols that engineers expect to be particularly tricky or error-prone. Formal specifications play multiple roles in this setting, from bug finding in final designs, to accelerating exploration of the design space, to serving as precise documentation of the implemented protocol. Typically, verification or model checking of these specifications is focused on safety and liveness. This makes sense: safety violations cause issues like data corruption and loss which are correctly considered to be among the most serious issues with distributed systems. But safety and liveness are only a small part of a larger overall picture. Many of the questions that designers face can’t be adequately tackled with these methods, because they lie outside the realm of safety, liveness, and related properties.

What latency can customers expect, on average and in outlier cases? What will it cost us to run this service? How do those costs scale with different usage patterns, and dimensions of load (data size, throughput, transaction rates, etc)? What type of hardware do we need for this service, and how much? How sensitive is the design to network latency or packet loss? How do availability and durability scale with the number of replicas? How will the system behave under overload?

We address these questions with prototyping, closed-form modelling, and with simulation. Prototyping, and benchmarking those prototypes, is clearly valuable but too expensive and slow to be used at the exploration stage. Developing prototypes is time-consuming, and prototypes tend to conflate core design decisions with less-critical implementation decisions. Closed-form modelling is useful, but becomes difficult when systems become complex. Dealing with that complexity sometimes require assumptions that reduce the validity of the results. Simulations, generally Monte Carlo and Markov Chain Monte Carlo simulations, are among the most useful tools. Like prototypes, good simulations require a lot of development effort, and there’s a lack of widely-applicable tools for simulating system properties in distributed systems. Simulation results also tend to be sensitive to modelling assumptions, in ways that require additional effort to explore. Despite these challenges, simulations are widely used, and have proven very useful. Systems and database research approaches are similar: prototyping (sometimes with frameworks that make prototyping easier), some symbolic models, and some modelling and simulation work2.

What I want is tools that do both: tools that allow development of formal models in a language like Pluscal or P, model checking of critical parameters, and then allow us to ask those models questions about design performance. Ideally, those tools would allow real-world data on network performance, packet loss, and user workloads to be used, alongside parametric models. The ideal tool would focus on sensitivity analyses, that show how various system properties vary with changing inputs, and with changing modelling assumptions. These types of analyses are useful both in guiding investments in infrastructure (“how much would halving network latency reduce customer perceived end-to-end latency?”), and in identifying risks of designs (like finding workloads that perform surprisingly poorly).

This is an opportunity for the formal methods community and systems and database communities to work together. Tools that help us explore the design space of systems and databases, and provide precise quantitative predictions of design performance, would be tremendously useful to both researchers and industry practitioners.

Later commentary

This gap is one small part of a larger gap in the way that we, as practitioners, design and build distributed systems. While we have some in-the-small quantitative approaches (e.g. reasoning about device and network speeds and feeds), some widely-used modelling approaches (e.g. Markov modelling of storage and erasure code durability), most of our engineering approach is based on experience and opinion. Or, worse, à la mode best-practices or “that’s how it was in the 70s” curmudgeonliness. Formal tools have, in the teams around me, made a lot of the strict correctness arguments into quantitative arguments. Mental models like CAP, PACELC, and CALM have provided ways for people to reason semi-formally about tradeoffs. But I haven’t seen a similar transition for other properties, like latency and scalability, and it seems overdue.

Quantitative design has three benefits: it gives us a higher chance of finding designs that work, it forces us to think through requirements very crisply, and it allows us to explore the design space nimbly. We’ve very successfully applied techniques like prototyping and ad hoc simulation to create a partially quantitative design approach, but it seems like its time for broadly applicable tools.

Footnotes

  1. See, for example Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3, and How Amazon Web Services Uses Formal Methods
  2. E.g. the classic Concurrency control performance modeling: alternatives and implications, from 1987.
联系我们 contact @ memedata.com