内存子系统优化
Memory Subsystem Optimizations

原始链接: https://johnnysswlab.com/memory-subsystem-optimizations/

## 内存子系统优化:摘要 这系列博客包含18篇文章,专注于通过高效利用内存子系统来优化软件性能——这对于处理大型数据集的应用至关重要,即使对于较小的应用也有益处。 主要涵盖的领域包括**减少内存访问**(通过寄存器使用和编译器技术)、通过访问模式和布局更改(类、数据结构)**提高数据局部性**,以及为了提高速度**减小数据集大小**。该系列还深入探讨了通过**自定义内存分配器**和**提高指令级并行性**进行的运行时优化。 进一步的主题包括使用预取**隐藏内存延迟**、**最小化TLB缓存缺失**以及**节省内存带宽**以实现“良好邻居”编程。 此外,还讨论了**多线程应用程序**和**低延迟系统**的专门考虑因素,以及**测量内存子系统性能**和**分支预测与内存访问**之间的关系。 这篇博客提供了实用的技术,通过理解和利用内存层次结构的复杂性来提高软件的速度和效率。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 内存子系统优化 (johnnysswlab.com) 5 分,来自 mfiguiere 38 分钟前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

In this blog I wrote 18 blog posts about memory subsystem optimizations. By memory subsystem optimizations, I mean optimizations that aim at making software faster by better using the memory subsystem. Most of them are applicable to software that works with large datasets; but some of them are applicable to software that works with any data regardless of its size.

Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

Here is a list of all posts that we covered on Johnny’s Software Lab:

TopicDescriptionLink
Decreasing Total Memory AccessesWe speed up software by keeping data in registers instead of reloading it from the memory subsystem several times.Decreasing the Number of Memory Accesses 1/2

Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2

Changing the Data Access Pattern to Increase LocalityBy changing our data access pattern we increase the possibility our data is in the fastest level of data cache.For Software Performance, the Way Data is Accessed Matters!
Changing the Data Layout: ClassesSelecting proper class data layout can improve software performance.Software Performance and Class Layout
Changing the Data Layout: Data StructuresBy changing the data layout of common data structures, such as linked lists, trees or hash maps we can improve their performance.Faster hash maps, binary trees etc. through data layout modification
Decreasing the Dataset SizeMemory efficiency can be improved by decreasing the dataset size. This results in speed improvements as well.Memory consumption, dataset size and performance: how does it all relate?
Changing the Memory LayoutWhereas data layout is determined at compile time, memory layout is determined by the system allocator at runtime. We examine how changing the memory layout using custom allocators influences software performance.Performance Through Memory Layout
Increasing instruction-level parallelismSome codes cannot utilize the memory subsystem fully because of instruction dependencies. Here we investigate techniques that break dependencies and improve performance.Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP

Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code

Software prefetching for random data accessesExplicit software prefetches tell hardware that you will be accessing a certain piece of data soon. When used smartly, they can improve software performance.The pros and cons of explicit software prefetching
Decreasing TLB cache missesTLB cache is a small cache that speeds up translation of virtual to physical memory addresses. In some cases, it can be the reason for poor performance. We investigate techniques for decreasing TLB cache misses.Speeding Up Translation of Virtual To Physical Memory Addresses: TLB and Huge Pages
Saving the memory subsystem bandwidthIn some cases, we don’t care about software performance, but we do care about being a good neighbor. We investigate techniques that make our software consume least possible amount of memory subsystem resources.Frugal Programming: Saving Memory Subsystem Bandwidth
Branch prediction and data cachesWe investigate the delicate interplay of the branch prediction and the memory subsystem.Unexpected Ways Memory Subsystem Interacts with Branch Prediction
Multithreading and the Memory SubsystemHere we investigate how memory subsystem behaves in the presence of multithreading and how does that effect software speed.Multithreading and the Memory Subsystem
Low-latency applicationsIn some cases we are more interested in short latency than high throughput. We investigate the techniques aimed at improving latency, either by modifying our programs, or reconfiguring the system.Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache

Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms

Measuring Memory Subsystem PerformanceWe talk about tools and metrics you can use to understand what is going on with the memory subsystem.Measuring Memory Subsystem Performance
Other topicsA few remaining topics related to memory subsystem optimizations that didn’t fit any of the other categories.Memory Subsystem Optimizations – The Remaining Topics

Any feedback on the material covered in this posts will be highly appreciated.

Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

联系我们 contact @ memedata.com