旧物重现:内存优化
Everything old is new again: memory optimization

原始链接: https://nibblestew.blogspot.com/2026/03/everything-old-is-new-again-memory.html

由于人工智能侵权行为导致大量内存消耗,消费者设备可用内存正在减少,迫切需要重新关注内存效率。这引发了对价值和优化潜力的质疑。 一个Python和一个原生C++单词计数脚本的比较说明了这一点。简洁的Python版本使用1.3MB内存。C++版本利用字符串视图避免不必要的字符串对象创建,实现相同功能仅需约100kB – 减少了92.3%。 进一步优化,通过禁用C++构建中的异常处理,可以将内存使用量降低到仅21kB,减少了98.4%。虽然Python提供了便利和丰富的运行时环境,但此示例表明,对于特定任务,原生代码可以显著减小内存占用,这在内存受限的世界中至关重要。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 过去的东西终将重现:内存优化 (nibblestew.blogspot.com) 9 分,ibobev 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

At this point in history, AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full blast. Thus the amount of memory in consumer computers and phones seems to be going down. After decades of not having to care about memory usage, reducing it has very much become a thing.

Relevant questions to this state of things include a) is it really worth it and b) what sort of improvements are even possible. The answers to these depend on the task and data set at hand. Let's examine one such case. It might be a bit contrived, unrepresentative and unfair, but on the other hand it's the one I already had available.

Suppose you have to write script that opens a text file, parses it as UTF-8, splits it into words according to white space, counts the number of time each word appears and prints the words and counts in decreasing order (most common first).

This sounds like a job for Python. Indeed, an implementation takes fewer than 30 lines of code. Its memory consumption on a small text file looks like this.

Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.

A fully native C++ version using Pystd requires 60 lines of code to implement the same thing. If you ignore the boilerplate, the core functionality fits in 20 lines. The steps needed are straightforward:

  1. Mmap the input file to memory.
  2. Validate that it is utf-8
  3. Convert raw data into a utf-8 view
  4. Split the view into words lazily
  5. Compute the result into a hash table whose keys are string views, not strings

The main advantage of this is that there are no string objects. The only dynamic memory allocations are for the hash table and the final vector used for sorting and printing. All text operations use string views , which are basically just a pointer + size.

In code this looks like the following:

Its memory usage looks like this.

Peak consumption is ~100 kB in this implementation. It uses only 7.7% of the amount of memory required by the Python version.

In a way it is. The Python runtime has a hefty startup cost but in return you get a lot of functionality for free. But if you don't need said functionality, things start looking very different.

But we can make this comparison even more unfair towards Python. If you look at the memory consumption graph you'll quite easily see that 70 kB is used by the C++ runtime. It reserves a bunch of memory up front so that it can do stack unwinding and exception handling even when the process is out of memory. It should be possible to build this code without exception support in which case the total memory usage would be a mere 21 kB. Such version would yield a 98.4% reduction in memory usage.

联系我们 contact @ memedata.com