(评论)
(comments)
原始链接: https://news.ycombinator.com/item?id=40988837
1978 年,著名著作《C 编程语言》的作者 Brian Kernighan 表示,哈希表被认为“有点奇怪”。 然而,这一说法与历史证据相矛盾,历史证据表明哈希表在 20 世纪 60 年代和 1970 年代的计算机科学中得到了普遍使用和研究。 哈希表通常称为关联检索结构,是编译器、汇编器、链接器、数据库和文件系统中的标准功能。 它们早在 1973 年就出现在 Knuth 备受推崇的“计算机编程艺术”系列中,并且在整个 1970 年代,Communications of the ACM 上发表了大量讨论哈希表和相关主题的文章。 此外,Kernighan 本人也为其中许多出版物做出了贡献,这表明他确实熟悉这项技术。
Kernighan 的评论似乎是因为他专注于让编程语言更简单、更容易让初学者学习,主张将哈希表作为核心库结构,而不是要求用户手动实现它们。 到 1979 年,当 Unix 版本 7 发布时,以 Kernighan 的脚本语言 AWK 为特色,哈希表不再被视为深奥,而是编程中的常规工具。 如今,哈希表已成为现代编程语言(例如 Python、C++ 和 Java)中普遍存在的组件。
值得注意的是,第一篇讨论哈希表的论文由 Donald E. Knuth 撰写,可以追溯到 1968 年,而计算机协会 (ACM) 的 ACM 杂志的 Communications 中最早引用哈希表的是 1959 年此外,这个概念早于这两个里程碑,起源于 1953 年的 IBM 内部,并于 1956 年被 Dumas 记录在公开文献中。因此,考虑到哈希表已经有 20 多年的历史,Kernighan 对哈希表的描述是新颖的。 1978 年出版《C 编程语言》。
— kernighan definitely did not think hash tables were 'a bit exotic' in 01978 —
hash tables (or sometimes other associative retrieval structures) were standard fare in computer science degree programs and were a standard feature of compilers, assemblers, and linkers, of which kernighan had written several at that point; associative retrieval structures are also fundamental to databases and filesystems. in 01979, 7th edition unix (in which kernighan published awk) barely had date formatting but included an on-disk hash table library called 'dbm'
hash tables are the exclusive topic of pp. 506–549 of knuth's taocp volume 3, a standard textbook which came out in 01973. after recommending you some hash functions, on p. 512 (525/739) knuth recommends you check out a survey of hash functions published in cacm by lum, yuen, and dodd in 01971, and he cites papers on hashing in cacm as early as 01959 and in an ibm journal as early as 01957. in the 'history' section (pp. 540–542 (553–555/739)) he traces it back to an internal ibm report by luhn in 01953, and its first report in the open literature to a paper by dumey in 01956. most of the rest of the 739-page book is about how to implement associative arrays of various kinds, although some parts instead concern the string search problem, and the first half is about sorting, and a sorted table can be used for purposes other than associative retrieval
knuth also mentions a 'very influential survey' of hashing published in cacm in 01968 by robert morris, kernighan's coworker at bell labs, and one of the very first users of unix and a major contributor to it
there are several papers about hashing every year in cacm throughout the 01970s, as well as alternative associative-array structures such as tries, sorted arrays and binary search trees. in march 01975 we have https://cacm.acm.org/research/the-algorithm-sequential-acces..., in april 01975 we have https://cacm.acm.org/research/the-quadratic-hash-method-when..., in may 01975 we have https://cacm.acm.org/research/analysis-and-performance-of-in..., in july 01975 we have https://cacm.acm.org/research/a-note-on-hash-linking/, in september 01975 we have https://cacm.acm.org/research/multidimensional-binary-search..., in october 01975 we have https://cacm.acm.org/research/optimizing-the-performance-of-... and https://cacm.acm.org/research/merging-with-parallel-processo..., in january 01976 we have https://cacm.acm.org/research/performance-of-height-balanced..., in february 01976 we have https://cacm.acm.org/research/on-self-organizing-sequential-... and https://cacm.acm.org/research/a-stochastic-evaluation-model-..., in june 01976 we have https://cacm.acm.org/research/a-practitioners-guide-to-addre... (which is about hashing despite the name), in july 01976 we have https://cacm.acm.org/research/compressed-tries/, and in august 01976 we have https://cacm.acm.org/research/an-insertion-technique-for-one.... you could not read cacm in the 01960s and 01970s without frequently coming across hashing, as well as other associative-array structures.
but perhaps kernighan hadn't read cacm? on the contrary, he and his colleagues regularly published in it; he published in cacm once in 01972, once in 01973, and once in 01975, and his colleagues such as stephen johnson, mike lesk, al aho (his coauthor on awk), and, as i said above, robert morris, published there regularly as well
in the practice of programming, kernighan and pike claim that almost all of the time, the only data structures you need are arrays, linked lists, trees, and hash tables (though they wrote this in 01999)
what kernighan is talking about here is including them as a basic language construct, a design choice his 01978 tech report on awk likens to that of snobol4, which was a well-known language but not a mainstream one. in the context of the above cacm articles, you can perhaps see that the daring choice was to offer a single 'good enough' hash table implementation rather than one optimized for a given purpose—and to use it even in place of conventional arrays
— kernighan was not isolated like you were —
kernighan was at bell labs, which was probably the most connected place in the software world at the time
when awk was released in 01979 the internet was already ten years old, and at&t (the parent company of bell labs) had installed all its long-distance links
in 01976 they'd also developed their own 'internet' called uucp, which at the time of awk's release was interconnecting 80 different unix sites, interchanging software updates on a daily basis, over 300-baud and 1200-baud modems. they were also in contact with ken thompson's alma mater berkeley; thompson took a sabbatical year to teach there in 01975. berkeley got a darpa contract to add tcp/ip support to the bell labs operating system 'unix', which would become the majority of the internet during the 01980s; berkeley participated actively in the uucp network
bell labs was also working on the so-called 'advanced communication system', its own misconceived version of the internet, where data would be routed using phone numbers, because data communications had become the fastest-growing source of call traffic. (colin berkshire has written a fantastic overview of this forgotten history.)