本科生推翻40年旧猜想,发明新型哈希表
Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table

原始链接: https://www.wired.com/story/undergraduate-upends-a-40-year-old-data-science-conjecture/

四十年来,计算机科学家一直相信姚期智在1985年提出的猜想:均匀探测是某种哈希表的最优方法,其最坏情况查询时间为“x”。研究员Krapivin在不知晓姚氏工作的情况下,开发了一种使用小型指针的新型哈希表,这与姚氏的猜想相矛盾。他实现了(log x)²的最坏情况查询和插入时间,显著快于姚氏预测的“x”。 他和Farach-Colton及Kuszmaul合作,证明了(log x)²是这类哈希表的最佳界限,从而推翻了姚氏的猜想。 此外,该团队还驳斥了姚氏关于“贪婪”哈希表平均查询时间不可能优于log x 的说法。他们展示了一个非贪婪哈希表,其平均查询时间为常数,与哈希表的填充程度无关。这一意外发现挑战了关于哈希表性能的传统观点,无论哈希表有多满,都能提供恒定的平均查询时间。这项成果可能没有立即的应用,但它提升了我们对数据结构的理解。

一篇Hacker News帖子的简要总结:据wired.com报道,本科生Krapivin推翻了一个存在40年的猜想,并发明了一种新型哈希表。评论者taylordl指出,Krapivin的成功源于他对既定常识的无知,这与现代物理学中知识积累有时会阻碍真正创新的观点相呼应。另一位用户hullo提供了2012年原始Hacker News讨论的链接。该帖子还包含标准的Hacker News页脚信息,包括指向指南、常见问题解答、法律信息和其他资源的链接。此外还包含AI创业学校的公告。

原文

In a 1985 paper, the computer scientist Andrew Yao, who would go on to win the A.M. Turing Award, asserted that among hash tables with a specific set of properties, the best way to find an individual element or an empty spot is to just go through potential spots randomly—an approach known as uniform probing. He also stated that, in the worst-case scenario, where you’re searching for the last remaining open spot, you can never do better than x. For 40 years, most computer scientists assumed that Yao’s conjecture was true.

Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said. His explorations with tiny pointers led to a new kind of hash table—one that did not rely on uniform probing. And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)2—far faster than x. This result directly contradicted Yao’s conjecture. Farach-Colton and Kuszmaul helped Krapivin show that (log x)2 is the optimal, unbeatable bound for the popular class of hash tables Yao had written about.

“This result is beautiful in that it addresses and solves such a classic problem,” said Guy Blelloch of Carnegie Mellon.

“It’s not just that they disproved [Yao’s conjecture], they also found the best possible answer to his question,” said Sepehr Assadi of the University of Waterloo. “We could have gone another 40 years before we knew the right answer.”

Krapivin on the King’s College Bridge at the University of Cambridge. His new hash table can find and store data faster than researchers ever thought possible.

Photoraph: Phillip Ammon for Quanta Magazine

In addition to refuting Yao’s conjecture, the new paper also contains what many consider an even more astonishing result. It pertains to a related, though slightly different, situation: In 1985, Yao looked not only at the worst-case times for queries, but also at the average time taken across all possible queries. He proved that hash tables with certain properties—including those that are labeled “greedy,” which means that new elements must be placed in the first available spot—could never achieve an average time better than log x.

Farach-Colton, Krapivin, and Kuszmaul wanted to see if that same limit also applied to non-greedy hash tables. They showed that it did not by providing a counterexample, a non-greedy hash table with an average query time that’s much, much better than log x. In fact, it doesn’t depend on x at all. “You get a number,” Farach-Colton said, “something that is just a constant and doesn’t depend on how full the hash table is.” The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected—even to the authors themselves.

The team’s results may not lead to any immediate applications, but that’s not all that matters, Conway said. “It’s important to understand these kinds of data structures better. You don’t know when a result like this will unlock something that lets you do better in practice.”


Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.

联系我们 contact @ memedata.com