x86架构的自引用页表

x86架构的自引用页表
Self-referencing Page Tables for the x86-Architecture

原始链接: https://0l.de/blog/2015/01/bachelor-thesis-abstract/

## x86架构下高效的页表管理这项工作源于本科毕业论文，并作为研究助理继续进行，提出了一种在x86架构（32位和64位）下管理页表的新方法。现代处理器使用多级页表将虚拟地址转换为物理内存，这个过程传统上需要复杂的映射和大量的内存开销。这项研究利用根页表中的“自引用”，允许操作系统直接从虚拟地址空间（VAS）访问所有页表，而无需手动映射。这简化了代码，减少了内存消耗，并提高了可维护性。该技术依赖于所有级别上一致的寻页标志编码和相等的表大小——这些都是x86架构中存在的特性。该方法在开源教学操作系统“eduOS”中实现和测试，自引用为页表保留了VAS中一小部分、可以忽略不计的空间。虽然Intel和AMD没有明确记录，并且操作系统支持有限（只有一个关于潜在Microsoft NT内核使用的参考），但这种方法为页表管理提供了一个显著的优化。最初提交给ASPLOS会议的稿件被拒绝，但作者现在正与英语读者分享这项工作。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 x86架构的自引用页表 (0l.de) 4点由 stv0g 1小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

Almost fourteen months ago, I started working on my bachelor thesis. Although I finished it half a year ago, it’s still part of my work as a student research assistant.

During my initial work, most of the code was written for an internal research kernel. I’m now happy that we were able to port it to an open source kernel called eduOS: /RWTH-OS/eduOS ). This minimal operating system is used for practical demo’s and assignments during the OS course at my university. There’s much more I could write about. So this will probably be another separate blog post.

The motive for this article is an abstract I wrote for the student research competition of the ASPLOS conference which is held this year in Istanbul, Turkey. Unfortunately my submission got rejected. But as a nice side-effect, I’ve now the chance to present my work to an English audience as well:

Download:Extended Abstract (PDF)

Steffen Vogel

Academic advisor: Dr. rer nat. Stefan Lankes Institute for Automation of Complex Power Systems E.ON Energy Research Center, RWTH Aachen University Mathieustr. 10, 52074 Aachen, Germany

This was a submission for ASPLOS Student Research Competition ’15 Istanbul, Turkey¹.

The adoption of 64 bit architectures went along with an extension of the virtual address space (VAS). To cope with this growth, the memory management unit (MMU) had to be extended as well. For paging-based systems like Intel’s x86-architecture this was realized by adding more levels of indirection to the page table walk.

This walk translates virtual pages to physical page frames (PF) by performing look-ups in a radix / prefix tree in which every node represents a page table (Figure 1a). Since the tables are part of the translation process, they must be referenced by physical page frame numbers (PFN, blue line). As the operating system is only eligible to access the VAS, it cannot follow the path of a walk. In order to allow the manipulation of page tables, it must provide:

Figure 1a: Page table walk in the x86 64 longmode: Traditional, without self-reference.

Access to the table entries, by mapping the tables themselves to the VAS.
A mapping between physical references to corresponding locations in the VAS.

Additionally, every level of the page table walk increases the complexity of managing these mappings. They also increase the memory consumption by occupying physical page frames. It is possible to avoid both drawbacks by the technique described in the following.

In my bachelor thesis, I presented an approach, which is compatible with both the 32 bit and 64 bit version of Intel’s x86-architecture. This allows for a replacement of two code bases, one for each architecture, by one supporting both. Thus, results in a shorter, easier comprehensible, and maintainable code. As foundation for this implementation our teaching OS called “eduOS” was used². “eduOS” supports only the 32 bit protected mode whereas the 64 bit longmode is only implemented for an internal research kernel.

Thanks to the sophisticated design of Intel’s x86 MMU, it is possible to avoid most of the complexity and space requirements by using a little trick. Adding a self-reference in the root table (PML4 resp. PGD) automatically enables access to all page tables from the VAS without the need for manual mappings as described above (Figure 1b). The operating system does not need to manually follow the path of a page table walk, as this task is executed by the MMU for accessing individual tables instead of page frames.

Figure 1b: Page table walk in the x86 64 longmode: With self-reference.

An access to the VAS region covered by a self-reference causes the MMU to look up the root table twice (red line). Effectively, this shifts the whole page table walk by one level. Therefore, it stops with the PFN of page tables instead of page frames that are usually translated by the MMU. Here, both the PML4 and PDPT indexes are used to choose an entry out of the PML4 table. Therefore, it must be guaranteed that PML4 entries can be interpreted as PDPT entries, too. This demands for the following requirements:

Homogenous coding of paging flags across all paging levels.
Equal table sizes across all paging levels.

Fortunately, the x86-architecture complies with this prerequisites as shown in Figure 2. Green colored flags are coded consistently across all paging levels. Only PAT, size and global flags have a slightly different meaning for entries in the PGT. My bachelor thesis shows that these deviations still allow maintaining full control caching and memory protection properties of self-mapped tables. This includes for common system calls like fork() and kill().

Figure 2: Similar flags across all paging levels.

By repeatedly addressing the self-reference, it is also possible to access tables of the upper levels (PGD to PML4). Table 1 shows the resulting virtual addresses of all page tables when using the last (512th) entry of the PML4 table for the self-reference³. This grants access to all possible page tables, including those which might not yet exist and may be allocated in the future. Hence, the self-reference reserves a fixed fraction of the VAS for the page tables. The size of this region is equal to 256 TiB / 512 = 512 GiB for 64 bit (resp. 4 GiB / 1024 = 4 MiB for 32 bit), which is negligible in comparison to the huge VAS of 248 byte.

Table 1: Virtual addresses of self-mapped tables.

For the manipulation of page table entries two approaches are feasible:

Top-down Use known tree traversals, starting at the root node, which corresponds to the PML4 respectively PGD.
Bottom-up Use the page fault handler to create new tables on-the-fly, when they are not yet present.

But there are also other architectures which satisfy the prerequisites described above. One of these is the Alpha⁴ architecture, which suggests a similar approach in the reference manual. Intel and AMD do not mention the technique in their x86 manuals. In the field of operating systems, support is far more limited. There is only a single reference⁵ dated to 2010 indicating that Microsoft might use a similar approach for its NT kernel. Linux cannot profit because its paging implementation must support a broad selection of virtual memory architectures of which not all fulfill the requirements mentioned above.

x86架构的自引用页表 Self-referencing Page Tables for the x86-Architecture

x86架构的自引用页表
Self-referencing Page Tables for the x86-Architecture