随机计算

随机计算
Stochastic computing

原始链接: https://scottlocklin.wordpress.com/2025/10/31/stochastic-computing/

## 随机计算：旧理念的复兴随机计算是一种非传统计算架构，正因Extropic等初创公司的努力而复兴。这一概念可以追溯到20世纪60年代，由Brian Gaines和Wolfgang Poppelbaum率先提出，为早期的超级计算机提供了一种具有竞争力的替代方案。它不使用精确的电压，而是使用随机比特流来表示数据，并通过计数进行计算——本质上是一种对模拟计算的数字方法。这种方法具有抗噪声和固有并行性的优势，使其特别适合机器学习任务。在拟合即使是简单的机器学习模型都计算成本高昂的时代，随机计算提供了一条可行的途径，尤其是在构建模仿生物系统的神经网络方面。现代芯片技术的进步——数十亿个晶体管和快速串行通信——使随机计算更具前景。它有可能为深度神经网络带来显著的能效提升（可能高达10,000倍），并为实现概率模型提供更自然的方式。虽然目前的实现仍然是理想化的，但其核心原理为当前的计算范式提供了一种潜在的突破性替代方案。

黑客新闻新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交登录随机计算 (scottlocklin.wordpress.com) 8 分，由 emmelaich 发表于 2 小时前 | 隐藏 | 过去的 | 收藏 | 1 条评论 emil-lp 发表于 9 分钟前 [–] 在随机计算中使用随机性与算法（例如复杂度类 BPP）使用随机性解决问题有何关联？回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

I’ve wanted to write about this topic since I started this blerg, but various things have kept me from it. Stochastic computing is a subject I’ve been somewhat aware of since I went through the “Advances in Computers” series in the LBNL library while procrastinating over finishing my PhD thesis. It’s an old idea, and the fact that sci-hub exists now means I can look up some of these old papers for a little review. The fact that there is a startup reviving the idea (their origin paper here) gives a little impetus to do so.

Von Neuman had a vague idea along the lines of stochastic computing in the 1950s, but the first actual stochastic computing architectures were done more or less independently by Brian Gaines, and Wolfgang (Ted) Poppelbaum in the mid-1960s. I first read about this in “Advances in Computers” in the LBNL library while procrastinating on writing my dissertation, and while I didn’t take any notes, it always remained stuck in my noggin. The once-yearly journal started in 1960 and I at least skimmed the abstracts all the way to 2002 or whenever it was I was procrastinating (physics libraries are way better than the internet/wikipedia for this sort of thing). My takeaway was that Tukey’s FFT was godawful important, there were a few cool alternatives to normal computer architectures, and climatological calculations have always been more or less the same thing (aka lame) with bigger voxels in the old days.

Stochastic computing was one of the cool alternatives; it was competitive with supercomputers back in the 1960s, and it was something a small research group could physically build. As a reminder, a $25 million (current year dollars) CDC6600 supercomputer of 1964 was two megaflops with space for 970 kilobytes of RAM. They made about 100 of them total for the world. An NVIDIA titan RTX has 24gb and runs at something like 130 teraflops. Your phone is probably 2 teraflops. Something like stochastic gradient descent was, therefore, not worth thinking about even on a supercomputer in those days. Fitting a perceptron or a Hidden Markov model was a considerable technical achievement. People would therefore use various kinds of analog computer for early machine learning research. That includes stuff involving matrices which were modeled by a matrix of resistors. Analog computers can do matrix math in O(1); it only takes as long as the voltage measurement takes. Analog used to be considered parallel computation for this reason.

One can think of stochastic computing as a way of digitally doing a sort of analog computing. Instead of continuous voltage levels, stochastic computing used streams of random (known distribution) bits and counted them. This has many advantages; counting bits is pretty easy to do reliably, where measuring precise voltages and reliably converting them back into bits requires much more hardware. It’s also more noise immune in that a noise event is an extra bit or two which doesn’t affect the calculation as much as some unknown voltage would affect an analog machine. Thinking about this a different way, if you’re dealing with machine learning, the numbers being piped around represent probabilities. A bit of noise hitting a stochastic stream of bits might add a couple of bits and is unlikely to make the overall probability greater than 1. If your probability is analog voltage levels, any old noise is very likely to botch your probability and turn it into some ad-hoc Dempster-Shafer thing (which hadn’t been invented yet) where probabilities sum to greater than 1. While I’m talking about probabilities here, let it be known that standard mathematical operations are defined in these computers: addition, multiplication (including for matrices), calculus: just like with analog computers.

Noise immunity is quite a useful property for 1960s era computers, especially in noisy “embedded” environments. TTL was available, but noisy and not particularly reliable for digital calculations. It was contemplated that such machines would be useful in aircraft avionics, and it was actually used in a small early autonomous neural net robot, tested in a British version of LORAN, a radar tracker and PID system.

The early stochastic computers were generally parts of some kind of early neural net thing. This was a reasonable thing to do as back then it was known that animal neural nets were pulse rate encoded. The idea lives on in the liquid state machine and other spiking neural nets, a little-loved neural net design which works sort of like actual neurons, but which is quite difficult to simulate on an ordinary computer that has to calculate serially. Piece of cake for stochastic computers, more or less, as stochastic computers include all the parts you need to build one, rather than simulating pulses flying around. Similarly, probabilistic graphical models are easier to do on this sort of architecture, as you’re modeling probabilities in a straightforward way, rather than the indirect random sampling of data in used PGM packages. Brian Gaines puts the general machine learning problem extremely well, “the processing of a large amount of data with fairly low accuracy in a variable, experience-dependent way.” This is something stochastic machines are well equipped to deal with. It’s also a maxim repeatedly forgotten and remembered in the machine learning community: drop-out, ReLu, regularization, stuff like random forests are all rediscoveries of the concept.

This isn’t a review article, so I’m not mentioning a lot of the gritty details, but there are a lot of nice properties for these things. The bit streams you’re operating on don’t actually need to be that random: just uncorrelated, and there is standard circuitry for dealing with this. Lots of the circuits are extremely simple compared to their analogs on a normal sequential machine; memory is just a delay line for example. And you can fiddle with accuracy of digits simply by counting fewer bits.

Remember all this 1960s stuff was done with individual transistors and early TTL ICs with a couple devices on a chip. It was considered useful back then and competitive with and a lot cheaper than ordinary computers. Now we can put down billions of far more reliable transistors down on a single chip. Probably more unreliable transistors, which is what stochastic computers were designed around. Modern chips also have absurdly fast and reliable serial channels, like the things that AMD chiplets use to talk to each other. You can squirt a lot of bits around, very quickly. Much more quickly than in the 1960s. Oh yeah and it’s pretty straightforward to map deep neural nets onto this technology, and it was postulated back in 2017 to be much more energy efficient than what we’re doing now. You can even reduce the voltage and deal with the noise byproduct much more gracefully than a regular digital computer.

I assume this is what the Extropic boys are up to. I confess I looked at their website and one of the videos and I had no idea WTF they were talking about. Based Beff Jezos showing us some Josephson gate prototype that has nothing to do with anything (other than being a prototype: no idea why it had to be superconducting), babbling about quantum whatevers did nothing for me other than rustling my jimmies. Anyway, if their “thermodynamic computing” is stochastic computing, it is a very good idea. If it is something else, stochastic computing is still a very good idea and I finally wrote about it. They’re saying impressive things like 10,000x energy efficiency, which of course is aspirational at the moment, but for the types of things neural nets are doing, it seems doable to me, and very much worth doing. In my ideal world, all the capital that has been allocated to horse shit quasi-scams like OpenAI and all the dumb nuclear power startups around it (to say nothing of the quantum computard startups which are all scams) would be dumped into stuff like this. At least we have a decent probability of getting better computards. It’s different enough to be a real breakthrough if it works properly, and there are no obviously impossible lacunae as there are with quantum computards.

Brian Gaines publications:

https://gaines.library.uvic.ca/pubs/

Good recent book on the topic:

Extropic patents:
https://patents.google.com/?assignee=extropic&oq=extropic

随机计算 Stochastic computing

随机计算
Stochastic computing