基于人工智能的Rosetta 2逆向工程 (适用于Linux虚拟机)
AI-powered reverse-engineering of Rosetta 2 (for Linux VM)

原始链接: https://github.com/Inokinoki/attesor

## Rosetta 2:深入了解苹果的二进制翻译技术 该项目详细描述了一项全面的逆向工程工作,旨在理解苹果的 Rosetta 2,这项动态二进制翻译技术使基于 Intel 的应用程序能够在 Apple Silicon Mac 上运行。继之前的过渡(摩托罗拉到 PowerPC,PowerPC 到 Intel)之后,Rosetta 2 是苹果最先进的解决方案,对于顺利迁移到 ARM64 架构至关重要。 Rosetta 2 采用预先编译 (AOT) 和即时编译 (JIT) 翻译。AOT 在安装期间翻译二进制文件以加快启动速度,而 JIT 处理动态代码。它将 x86_64 指令映射到 ARM64,翻译向量指令(SSE/AVX 到 NEON)和系统调用。 Rosetta 2 位于 `/Library/Apple/usr/libexec/oah/`,默认情况下未安装,但通过提示或命令行触发。该项目已经识别并命名了 Rosetta 2 中的 828 个函数,其中 612 个使用清晰的 C 代码实现,并将其归类为二进制翻译、系统调用处理和内存管理等领域。 这项工作提供教育资源、文档和逆向工程爱好者的社区平台,旨在分享知识并为更深入地理解这项复杂技术做出贡献。代码以 MIT 许可协议提供,仅供研究目的使用。

一位名为inoki的开发者正在使用人工智能逆向工程Rosetta 2,这是苹果的翻译层,用于在Apple silicon上运行x86-64代码。其目标是潜在地创建一个类似Houdini的二进制翻译器,用于Intel指令。该项目托管在GitHub上,专注于苹果为Linux虚拟机提供的Rosetta 2的Linux版本。 目前,人工智能反编译的代码不完整,缺失的部分用注释标明。虽然尚未生成可运行的二进制文件,但正在取得进展,包括逆向工程一些用于补丁的相关部分。人工智能生成的代码有时是推测性的,需要人工改进才能在Linux上正确运行。 该项目旨在了解Rosetta 2的哪些部分是特定于Apple silicon的,哪些部分更通用,从而可能将其应用范围扩展到苹果生态系统之外。
相关文章

原文

License: MIT Status

A comprehensive reverse-engineering effort to understand and document Apple's Rosetta 2 binary translation technology.

  1. Background
  2. What is Rosetta?
  3. What is Rosetta 2?
  4. How Apple Delivers Rosetta 2 in macOS
  5. Technical Architecture
  6. This Project
  7. File Structure
  8. Usage
  9. Progress
  10. References

The Architecture Transition

In November 2020, Apple announced their first Apple Silicon Macs, marking a historic transition from Intel x86_64 processors to their own ARM-based M1 chips. This was Apple's third major architecture transition:

  1. 1994: Motorola 68000 -> PowerPC
  2. 2006: PowerPC -> Intel x86_64
  3. 2020: Intel x86_64 -> Apple Silicon (ARM64)

Each transition required a binary translation solution to run existing software during the migration period. Rosetta 2 is Apple's most sophisticated binary translation system yet.


Rosetta (2006-2011) was Apple's first dynamic binary translation software, enabling PowerPC applications to run on Intel-based Macs.

  • Dynamic Translation: Translated PowerPC code to x86_64 at runtime
  • OS Integration: Built into Mac OS X 10.4 (Tiger) through 10.6 (Snow Leopard)
  • Transparent Operation: Users launched PowerPC apps normally
  • Performance Overhead: Typically 20-50% slower than native code

Rosetta was removed in Mac OS X 10.7 (Lion), completing the Intel transition.


Rosetta 2 is Apple's advanced dynamic binary translation technology that enables applications compiled for Intel x86_64 Macs to run on Apple Silicon (ARM64) Macs.

┌─────────────────────────────────────────────────────────────┐
│                    User Application (x86_64)                 │
├─────────────────────────────────────────────────────────────┤
│                     Rosetta 2 Layer                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │  Translator │  │  Runtime    │  │  System Call        │  │
│  │  (AOT/JIT)  │  │  Library    │  │  Translation        │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│                    macOS Kernel (ARM64)                      │
├─────────────────────────────────────────────────────────────┤
│                    Apple Silicon Hardware                    │
└─────────────────────────────────────────────────────────────┘
  1. Ahead-of-Time (AOT) Translation

    • Translates x86_64 binaries to ARM64 at install time
    • Stores translated code in a cache for faster subsequent launches
    • Reduces runtime overhead compared to pure JIT translation
  2. Just-in-Time (JIT) Translation

    • Translates code blocks on-demand during execution
    • Handles dynamically loaded code and self-modifying code
    • Maintains translation cache for efficiency
  3. Instruction Set Translation

    • x86_64 -> ARM64 instruction mapping
    • SSE/AVX -> NEON vector instruction translation
    • x86_64 flags -> ARM64 condition codes
  4. System Call Translation

    • Translates x86_64 macOS syscalls to ARM64 equivalents
    • Handles different calling conventions
    • Manages register state across syscall boundaries
  5. Runtime Support

    • CPU feature detection emulation
    • Thread-local storage handling
    • Signal and exception handling

How Apple Delivers Rosetta 2 in macOS

Rosetta 2 is located at:

/Library/Apple/usr/libexec/oah/
├── rosetta        # Main translator binary
├── rosettad       # Rosetta daemon
└── librosetta.*   # Runtime libraries

The oah directory stands for "Old Architecture Hardware" - a continuation from the PowerPC transition era.

On Apple Silicon Macs, Rosetta 2 is not installed by default. It's triggered in two ways:

  1. First Launch Prompt

    The "Rosetta" software is not installed on your Mac.
    Rosetta translates apps from Intel-based Macs for use on Apple Silicon Macs.
    
  2. Command-Line Installation

    softwareupdate --install-rosetta --agree-to-license
Component Description
RosettaLinux/rosetta Core ARM64 binary containing translation engine
RosettaLinux/rosettad System daemon managing translation services
debugserver -> /usr/libexec/rosetta/debugserver Debugging support for translated processes
libRosettaRuntime Runtime library linked during translation
translate_tool -> /usr/libexec/rosetta/translate_tool Translation tool for building translated binaries
  1. launchd Integration: Rosetta daemon runs as a system service
  2. Code Signing: Translated binaries are code-signed automatically
  3. Gatekeeper: Rosetta-translated apps pass security checks
  4. System Integrity Protection: Protected from modification

┌──────────────────────────────────────────────────────────────────┐
│ Phase 1: Binary Loading                                          │
│ ───────────────────────────────────────────────────────────────  │
│ 1. Load x86_64 Mach-O binary                                    │
│ 2. Parse segments, sections, symbols                            │
│ 3. Validate code signatures                                      │
│ 4. Map into translation context                                  │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│ Phase 2: AOT Translation                                         │
│ ───────────────────────────────────────────────────────────────  │
│ 1. Disassemble x86_64 code sections                              │
│ 2. Translate instructions to ARM64                               │
│ 3. Apply optimizations                                           │
│ 4. Store in translation cache (~/.oah)                          │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│ Phase 3: Runtime Execution                                       │
│ ───────────────────────────────────────────────────────────────  │
│ 1. Load translated ARM64 code                                    │
│ 2. Set up x86_64 emulation context                               │
│ 3. Handle JIT translations for dynamic code                      │
│ 4. Translate syscalls on-the-fly                                 │
└──────────────────────────────────────────────────────────────────┘

Key Translation Challenges

  1. Register Mapping

    • x86_64 has 16 GPRs; ARM64 has 31 GPRs
    • x86_64 flags register -> ARM64 NZCV flags
    • RIP (instruction pointer) emulation
  2. Memory Ordering

    • x86_64: Strong memory ordering (TSO)
    • ARM64: Weak memory ordering
    • Requires memory barriers for correctness
  3. Vector Instructions

    • SSE (128-bit) -> NEON (128-bit) direct mapping
    • AVX (256-bit) -> NEON pair emulation
    • Different exception handling for SIMD
  4. Calling Conventions

    • x86_64: First 6 args in registers (RDI, RSI, RDX, RCX, R8, R9)
    • ARM64: First 8 args in registers (X0-X7)
    • Different stack frame layouts

This repository contains reverse-engineered implementations of functions from the Rosetta 2 binaries. Through careful analysis and decompilation, we've identified and documented the semantic purpose of hundreds of functions.

  1. Educational: Understand how Rosetta 2 works internally
  2. Documentation: Create comprehensive documentation of translation techniques
  3. Implementation: Provide clean, well-documented C implementations
  4. Community: Share knowledge with the reverse-engineering community
  • 828 functions identified and named in the main rosetta binary
  • 612 functions fully implemented with clean C code
  • 66 categories of functionality documented
  • Complete function name mappings with semantic names
Category Functions Description
Entry Point 1 Rosetta initialization
FP/Vector Operations ~20 Floating-point and SIMD state management
SIMD Memory Operations ~10 memchr, memcmp, memcpy with SIMD
Vector Operations ~30 NEON vector arithmetic, comparison
Binary Translation ~50 x86_64 -> ARM64 instruction translation
Syscall Handlers ~60 System call translation and forwarding
Memory Management ~20 malloc, free, mmap wrappers
Hash Functions ~5 Address hashing for translation cache
String Operations ~30 SIMD-optimized string functions
Cryptographic Extensions ~30 AES, SHA, CRC32 passthrough
ELF Parsing ~15 Linux binary format support
Translation Cache ~20 AOT/JIT cache management

Rosetta2/
├── README.md                      # This file
├── rosetta_decomp.c               # Original decompilation (74,677 lines)
├── rosettad_decomp.c              # Daemon decompilation
├── rosetta_refactored.c           # Refactored implementations
├── rosetta_refactored.h           # Type definitions and declarations
├── rosetta_refactored_complete.c  # Complete refactored code
├── rosetta_refactored_complete.h  # Complete header with implementations
├── rosetta_function_map.h         # Function name mapping table
├── rosettad_refactored.c          # Daemon-side refactoring
├── REFACTORING_COMPLETE.md        # Refactoring completion summary
└── SESSION_*.md                   # Session-by-session progress logs

# Compile with GCC
gcc -c rosetta_refactored.c -o rosetta_refactored.o

# Include in your project
#include "rosetta_refactored.h"

# Or use the single-header implementation
#define ROSETTA_IMPLEMENTATION
#include "rosetta_refactored_complete.h"

Example: Using Translation Functions

#include "rosetta_refactored.h"

// Initialize Rosetta state
thread_state_t *state = create_thread_state();

// Translate a basic block
void *translated = translate_block(guest_pc);

// Execute translated code
execute_translated_block(translated, state);

Metric Value
Total Functions 828
Functions Implemented 612
Completion 74%
Categories Complete 66/66
Session Functions Focus
34 27 Additional Utility Functions
33 19 Cryptographic Extensions (SHA/CRC32)
32 10 Cryptographic Extensions (AES)
31 27 Advanced SIMD Operations
30 21 Saturating Convert Operations

Official Apple Documentation


This project is for educational and research purposes only.

  • Rosetta 2 is proprietary Apple software
  • This project does not distribute Apple's binaries
  • All code in this repository is written by Claude Code with Qwen 3.5.
  • Do not use this project to circumvent Apple's security measures

MIT License - See LICENSE file for details.


Contributions are welcome! Areas of interest:

  1. Implementing remaining functions
  2. Improving documentation
  3. Adding test cases
  4. Performance analysis
  5. Architecture diagrams

Last updated: February 2026

联系我们 contact @ memedata.com