随机性如何改进算法？

随机性如何改进算法？
How randomness improves algorithms (2023)

原始链接: https://www.quantamagazine.org/how-randomness-improves-algorithms-20230403/

## 计算中随机性的意外力量自计算机科学诞生之初，随机性就一直是令人惊讶的强大工具。最初用于模拟核反应等复杂过程，现在它对于解决天体物理学和经济学等领域的难题至关重要。令人惊讶的是，随机性不仅仅用于处理不确定性，它还可以有效地解决明确的问题。以素性测试为例——确定一个数字是否为素数。传统方法很慢，但利用费马小定理的算法使用随机数来快速评估一个数字的可除性。虽然单次测试并非万无一失，但使用随机输入进行重复试验可以产生接近确定性的结果。这种成功引发了一种更广泛的趋势：将问题重新构建为易于用*任何*随机选择的值来解决。虽然看似悖论——使用随机性来揭示结构——但研究表明，这些随机化解决方案通常具有确定性的对应物，或者极其困难的问题可能暗藏简单。虽然“去随机化”算法通常是可能的，但它常常不切实际。最近的突破，例如一种在复杂图中寻找最短路径的快速算法，证明了随机性的持续相关性，即使在确定性方法不可行时也能找到解决方案。看来，随机性是计算世界中一种基本且持久的技术。

## 算法中的随机性：摘要一篇最近的《Quanta Magazine》文章引发了Hacker News上关于将随机性融入算法的益处的讨论。核心思想，早在1998年Trevor Blackwell的博士论文中就被提倡，即引入随机成分可以通过减轻特定、潜在病态输入的影响，来稳定系统性能并使测量更可靠。用户们讨论了随机性如何帮助避免最坏情况，特别是针对设计用于利用确定性算法的对抗性输入。例子包括在服务器数量和请求限制中使用质数，以及随机性如何平滑工作负载聚集。值得注意的是，随机性不一定需要*真正的*随机性；一个好的伪随机数生成器（PRNG）通常就足够了。一个关键点是与复杂度类别的理论联系：如果P ≠ NP，那么对于每个随机算法，都存在一个确定性等价物。“去随机化”指的是找到那个确定性替代方案，或者改进抽样方法，超越简单的随机性。讨论还涉及了该主题的框架，认为它更少是“神秘的”，更多的是关于高效地从一组数值中抽样，而直接识别最优解的成本很高。

原文

Since the very first days of computer science — a field known for its methodical approach to problem-solving — randomness has played an important role. The first program to run on the world’s first general-purpose electronic computer used randomness to simulate nuclear processes. Similar approaches have since been used in astrophysics, climate science and economics. In all these cases, plugging in random numbers at certain steps in the algorithm helps researchers account for uncertainty about the many ways that complex processes can play out.

But adding randomness into an algorithm can also help you calculate the correct answer to unambiguous true-or-false questions. “You just say ‘OK, let me give up, let me not try, let me just pick something at random,’” said Eric Blais, a computer scientist at the University of Waterloo. “For way too many problems, that ends up being a successful approach.”

Let’s say you want to determine whether a given number is prime (divisible only by 1 and itself) or composite (also divisible by other integers). You could simply try to divide it by all possible factors, but for large numbers this “brute force” method and other factoring algorithms are excruciatingly slow. And if the number turns out to be composite, factoring algorithms tell you the values of its divisors — more information than you asked for. If you care only about a number’s “primality,” is there a more efficient algorithm?

There is if you use randomness. The basic idea goes back to a result from the 17th-century French mathematician Pierre de Fermat, known as his “little theorem.” Fermat considered two integers — call them N and x. He proved that if N is a prime number, then x^N − x is always a multiple of N, regardless of the value of x. Equivalently, if x^N − x is not a multiple of N, then N can’t be a prime number. But the inverse statement isn’t always true: If x^N − x is a multiple of N, then N is usually but not always prime.

To turn Fermat’s little theorem into a primality test, just take the N that you’re interested in, choose x at random, and plug the two numbers into x^N − x. If the result is not a multiple of N, then you’re done: You know that N is definitely composite. If the result is a multiple of N, then N is probably prime. Now pick another random x and try again. In most cases, after a few dozen tries, you can conclude with near certainty that N is a prime number. “You do this a small number of times,” Blais said, “and somehow now your probability of having an error is less than the probability of an asteroid hitting the Earth between now and when you look at the answer.”

The first primality tests using randomized algorithms (based on refinements to Fermat’s little theorem) ushered in a new era. Problem after problem turned out to be far easier to solve with randomness than with nonrandom, or deterministic, algorithms. The key was to reframe each problem as one that could be quickly solved given an appropriate value for some number x, and then prove that just about any x would do. The solution works even though researchers have no idea how to determine whether any specific choice is a good one. Mathematicians have joked that this unusual challenge is akin to finding hay in a haystack.

But these successes made researchers wonder why randomness should help with problems like primality testing, which are all about finding hidden, non-random patterns. “There’s something a bit paradoxical about it,” said Rahul Santhanam, a computer scientist at the University of Oxford. “Pure randomness is helping you get a handle on the structure that solves the problem.”

In 1994, the computer scientists Noam Nisan and Avi Wigderson helped resolve this confusion by demonstrating that randomness, though useful, probably isn’t necessary. They proved that one of two things must be true: Either all problems that can be efficiently solved using randomness also have fast deterministic algorithms, or many notoriously difficult problems are secretly easy. Computer scientists consider the second possibility very unlikely.

In fact, computer scientists often find it easier to develop a deterministic algorithm by starting with a randomized version and then “de-randomizing” it. “Once I have it, I suddenly see a very obvious way to make it deterministic,” said Eli Upfal, a computer scientist at Brown University. “But if I didn’t think about it in a randomized way as a probabilistic question, I probably would not think of it.”

Nearly 30 years after Nisan and Wigderson’s landmark proof, randomized algorithms remain as popular as ever, because de-randomization can be tricky and deterministic algorithms are often efficient only in principle. It wasn’t until 2002 that three researchers found a way to de-randomize primality testing, and in practice their algorithm is far slower than the best randomized algorithms. For other problems, it’s hard even to know where to begin — the best known algorithm has a chicken-and-egg problem that you can only escape through randomness.

That’s the case for a recent breakthrough in graph theory. Last year, three computer scientists developed a fast algorithm for finding the shortest path through a graph — a web of nodes connected by line segments — that works even when some segments subtract from the total path length rather than add to it. Their algorithm involved transforming the graph into a simpler one by deleting certain segments, solving the problem for the simplified graph, and then accounting for the deleted segments. They could prove that the algorithm would run quickly if no shortest path passed through too many deleted segments — otherwise, the last step would take too long.

But how to decide which segments to delete in the first place? It’s not just hard to find the ideal set of segments deterministically — it’s impossible. The set depends on which paths are shortest, the very problem the three researchers were trying to solve. But even though they couldn’t find the best set of segments to delete, they could prove that most random choices would be pretty good, and that was enough to break the self-referential loop. In the rare cases where the algorithm makes an unlucky choice and gets bogged down at the last step, they could just stop and run it again.

“Randomness is basically a way to ensure that something is true about the optimal solution without knowing the optimal solution,” said Aaron Bernstein, one of the authors of the new algorithm.

Randomness has found countless other uses in computer science, from cryptography to game theory to machine learning. Chances are, it’s here to stay.

随机性如何改进算法？ How randomness improves algorithms (2023)

随机性如何改进算法？
How randomness improves algorithms (2023)