显示 HN：龙芯架构用户态模拟器

显示 HN：龙芯架构用户态模拟器
Show HN: LoongArch Userspace Emulator

原始链接: https://github.com/libriscv/libloong

## libloong：高性能龙架构模拟器 libloong是一个紧凑（约18千行代码）且快速的用户空间模拟器库，用于龙架构，建立在libriscv的基础上。它设计用于嵌入应用程序中，尤其是在游戏引擎内的脚本编写，提供约4ns的低调用开销——远快于Lua（约150ns）或Java等替代方案。主要特性包括对64位龙架构（LA64）、向量指令（LSX/LASX）的支持，以及C++ API和Rust与Go的绑定，以及执行超时和内存保护等强大的安全特性。配置选项允许进行调试输出、二进制翻译和线程调度。性能基准测试显示出优异的结果，作为解释器可达到超过3000 CoreMark的分数，轻量级JIT可达到原生性能的38%。嵌入式二进制翻译目前可达到原生速度的约77%，潜力可达90%。libloong在需要低延迟和安全执行龙架构代码的场景中表现出色。

一位Hacker News用户分享了一个名为“libloong”的项目链接，这是一个LoongArch处理器的用户空间模拟器（github.com/libriscv）。随附的Medium文章详细介绍了该项目的笔记。一位名为“anthk”的评论者表达了对类似Sparc/Solaris系统模拟工作的热情，特别是为了运行较旧的软件，例如Internet Explorer 5，用于教育目的。他们强调了大量现有且可用的开源软件，这些软件在现代类Unix系统上仍然运行良好，并举例说明了Arena浏览器、XFig和Xephem。该评论者认为，这些工具虽然可能被忽视，但证明了开源软件的持久价值及其兼容性，这与今天运行专有软件（如IE5）的困难形成了对比。

原文

A high-performance LoongArch userspace emulator library designed for embedding and scripting applications.

Built on the proven architecture of libriscv, libloong has competitive interpreter performance while maintaining a compact ~18k line codebase.

For discussions & help, visit Discord.

Fast LoongArch interpreter with optional JIT
Ultra-low latency call overheads
Support for 64-bit LoongArch (LA64)
Support for vector LSX and LASX instructions
C++ API with Rust and Go bindings
Zero dependencies
Execution timeout and memory safety
First-class pause/resume support

Game engine scripting is where libloong excels. Traditional games expose modding through shared libraries (full system access), embedded VMs like Lua (~150ns call overhead), or Java run-times. libloong has ~4ns call overhead.

See the example Asteroid game.

CMake configuration options:

LA_DEBUG=ON/OFF - Enable debug output (default: OFF)
LA_BINARY_TRANSLATION=ON/OFF - Enable binary translation (default: OFF)
LA_THREADED=ON/OFF - Enable threaded bytecode dispatch (default: ON)
LA_MASKED_MEMORY_BITS=N - Set masked memory arena size to 2^N bytes (0 = disabled, default: 0)

Example with options:

cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DLA_MASKED_MEMORY_BITS=32 \
         -DLA_BINARY_TRANSLATION=ON
make -j6

#include <libloong/machine.hpp>

int main() {
    // Load a LoongArch ELF binary
    std::vector<uint8_t> binary = load_file("program.elf");

    // Create a machine with 64MB memory
    loongarch::Machine machine { binary, {
        .memory_max = 64 * 1024 * 1024
    }};

    // Setup program arguments
    machine.setup_linux({"program"}, {"LC_ALL=C"});

    // Run the program
    machine.simulate();
}

STREAM memory benchmark:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           33146.7     0.004884     0.004827     0.004962
Scale:          27825.2     0.005792     0.005750     0.005920
Add:            31388.6     0.007712     0.007646     0.007797
Triad:          29250.7     0.008268     0.008205     0.008379

There is a also a STREAM-like benchmark written in Rust in the examples:

Fill 76.3 MiB rate 27.9 GB/s | time min 2.9ms avg 3.1ms max 3.3ms
Copy 153 MiB  rate 35.3 GB/s | time min 4.5ms avg 4.6ms max 5.0ms
Scale 153 MiB rate 23.0 GB/s | time min 7.0ms avg 7.0ms max 7.1ms
Add 229 MiB   rate 31.9 GB/s | time min 7.5ms avg 7.6ms max 7.7ms
Triad 229 MiB rate 11.1 GB/s | time min 21.5ms avg 21.6ms max 21.8ms

CoreMark 1 0 interpreters, Dec 2025 (Ryzen 7950X)

Register machines still stand strongest at the end of 2025. libloong is currently the fastest 64-bit interpreter, reliably reaching 3000+ CoreMark score.

The lightweight JIT reaches 38% of native performance (15.5k vs 41k CoreMark) with full feature parity to the interpreter:

CoreMark 1.0 : 15580.375613 / GCC14.2.0 -O3 -DPERFORMANCE_RUN=1 / Static

Using embedded binary translation, it's currently possible to reach ~77% of native:

CoreMark 1.0 : 31962.238533 / GCC14.2.0 -O3 -DPERFORMANCE_RUN=1 / Static

.. however more work is needed to reach full potential. The upper bound for embedded binary translation should be around ~90% of native.

显示 HN：龙芯架构用户态模拟器 Show HN: LoongArch Userspace Emulator

显示 HN：龙芯架构用户态模拟器
Show HN: LoongArch Userspace Emulator