Riscrithm – 一款使用 Go 语言编写、直观易用的 RISC-V 汇编器与优化器
Riscrithm – An intuitive RISC-V assembler and optimizer coded in Go

原始链接: https://github.com/ghetea-patrick/riscrithm

Riscrithm 是一种高级宏汇编方言,旨在填补人类可读代码与原始 RISC-V 汇编之间的鸿沟。它通过宏和简写简化了复杂操作,同时实现了确定性的裸机控制。 **主要特性:** * **编译:** 使用 `riscrithm` 命令行界面将源文件编译为 `.s` 汇编文件,并可通过可选的 `-o` 标志进行优化。 * **语法与作用域:** 该语言严格基于缩进。代码组织在带标签的块中;未缩进的代码将触发语法错误。使用 `!!` 标签可插入原始 RISC-V 指令。 * **操作:** Riscrithm 为寄存器管理(如 `load`、`move`、`swap`)、算术运算以及栈/堆内存访问(如 `stack.w`、`heap.d`)提供了直观的语法。 * **逻辑:** 它使用类似三元运算的 `if/else` 分支和循环标签,以可读的原语替代了复杂的样板代码。 * **优化:** 编译器采用两遍扫描架构。启用后,优化器可执行死赋值消除、恒等式数学运算移除以及强度削减(例如,将乘以 2 的幂转换为位移运算)。 Riscrithm 强制执行命名规范——寄存器使用 camelCase,标签使用 snake_case,常量使用 SCREAMING_SNAKE_CASE——从而确保生成的汇编代码整洁、可读,并可立即部署到硬件模拟器或调试器中。

抱歉。
相关文章

原文

Hey there. If you're looking at this, you are probably getting your hands dirty with Riscrithm, a high-level macro-assembly dialect that compiles straight down to pure RISC-V assembly. Think of it as a bridge between the readability of a high-level language and the raw, deterministic control of bare-metal hardware. Let's dive straight into how the compiler works, the syntax rules, and what's happening under the hood.

To compile your source code, you'll use the riscrithm CLI tool. The syntax is straightforward:

riscrithm "source_code_file" "assembly_target_file" [-o/--optimize]
  • Source Code: Your Riscrithm input file.
  • Target File: The generated .s assembly file. If this file doesn't exist, the compiler will create it for you on the fly.
  • Optimization: Pass -o or --optimize to enable the optimization sweep (more on the compiler architecture later).

2. File Structure & Globals

Every Riscrithm file must declare its target section and entrypoint at the very top. These, along with macro definitions, are the only lines allowed to exist completely unindented outside of a label block.

  • header : Sets the assembly section. For instance, header default translates to .section .text.
  • entrypoint : Defines where the program starts. Passing entrypoint main translates to .globl main.
header default
entrypoint main

You can define text-replacement macros using the define keyword. This is perfect for aliasing registers or creating single-line inline functions. Here are some classic developer examples:

define foo = x1
define bar = x2
define baz = x3
define horseBattery = x4
define apple = 10
define orange = 20
define clearFoo = foo ^^

Whenever the parser sees foo, it swaps it with x1 before processing any actual logic.

Comments are written using the # symbol. The compiler strips out anything following a # on any line, so you can place them anywhere safely.

3. Labels, Indentation, and Raw Blocks

Riscrithm is strictly scoped via indentation.

Labels define your execution blocks and must end with a colon. They must not have any indentation. Conversely, every instruction inside a label must be indented (spaces or tabs). If you leave an instruction unindented, the compiler will throw a SyntaxError.

main:
    load foo = apple
    move bar = foo

If you need to bypass the Riscrithm preprocessor and write raw RISC-V assembly, prefix your label with !!. The compiler strips the exclamation marks but passes everything inside that block completely untouched. Macros and shorthands will not expand here.

!!raw_block:
    li x1, 10
    foo ^^ # This stays exactly as written!

4. Core Features & Instructions

Here is the meat of the language. Riscrithm maps readable statements directly to hardware instructions.

System & Interrupt Controls

Instead of remembering privilege-level opcodes, use explicit system calls:

Riscrithm RISC-V Assembly Description
interrupt.u uret User-mode trap return
interrupt.s sret Supervisor-mode trap return
interrupt.m mret Machine-mode trap return
wait wfi Wait for interrupt (low-power state)
trap ebreak Debugger trap
halt ecall System environment call / halt
... nop No-operation (ellipsis)

Let’s talk about code style. To keep your Riscrithm source files readable and consistent, the compiler expects (and highly encourages) a clean split in how you name your identifiers. Here is the naming convention breakdown:

  • Variables & Registers (camelCase): Any variable alias or register macro you define should start with a lowercase letter, with each subsequent word capitalized.
    • Examples: firstNum, addressRegister, stackOffset
  • Labels & Code Blocks (snake_case): Execution targets, loop boundaries, and conditional blocks use lowercase words separated by underscores. This makes them pop visually against instructions.
    • Examples: loop_start, on_true, error_handler
  • Constants & Literals (SCREAMING_SNAKE_CASE): Hardcoded configuration values, static offsets, or global definitions that shouldn't change use all-uppercase letters separated by underscores.
    • Examples: DEFAULT_HEADER, MAX_BUFFER_SIZE, IMM_VALUE

6. Complete Operator & Expression Reference

Excluding the hardware system traps and conditional branching symbols, here is the complete table of mutators, arithmetic expressions, and memory operators supported by the single-pass compiler engine.

Core Expressions and Memory Operators

Riscrithm Syntax Category Internal Expansion / Behavior Target RISC-V Assembly
load = Assignment Direct immediate assignment li reg, imm
move = Assignment Register-to-register copy mv reg1, reg2
swap Value Exchange Triple-XOR non-destructive swap xor reg1, reg1, reg2
xor reg2, reg1, reg2
xor reg1, reg1, reg2
-> stack.[b/w/d] Stack Memory Dec pointer, store byte/word/double addi sp, sp, -offset
s[b/w/d] reg, 0(sp)
<- stack.[b/w/d] Stack Memory Load byte/word/double, inc pointer l[b/w/d] reg, 0(sp)
addi sp, sp, offset
= stack.[b/w/d] Stack Memory Peek value from top of stack l[b/w/d] reg, 0(sp)
<- heap.[b/w/d] from & Heap Memory Base-register memory read (load) l[b/w/d] reg1, 0(reg2)
-> heap.[b/w/d] from & Heap Memory Base-register memory write (store) s[b/w/d] reg1, 0(reg2)

This section covers basic math operations, self-mutators, and compound shorthands. Remember, the compiler automatically realigns regular operations to their immediate equivalent (addi, andi, etc.) if the right-hand side is an integer literal.

Riscrithm Syntax Operator Type Evaluated Expression
++ Self Operator = + 1
-- Self Operator = - 1
^^ Self Operator = ^ (Fast Register Clear)
+= Compound Tag = +
-= Compound Tag = -
*= Compound Tag = *
/= Compound Tag = /
%= Compound Tag = %
<<= Compound Tag = <<
>>= Compound Tag = >>
= + Base Arithmetic Addition (Supports immediate realignment)
= - Base Arithmetic Subtraction (Supports immediate realignment)
= & Base Arithmetic Bitwise AND (Supports immediate realignment)
= Base Arithmetic
= ^ Base Arithmetic Bitwise XOR (Supports immediate realignment)
= << Base Arithmetic Logical Shift Left (Supports immediate realignment)
= >> Base Arithmetic Logical Shift Right (Supports immediate realignment)
= * Base Arithmetic Hardware Multiplication (M-Extension)
= / Base Arithmetic Hardware Division (M-Extension)
= % Base Arithmetic Hardware Remainder (M-Extension)

Branching and Conditionals

To unconditionally jump, use the @ symbol:

@some_label # Compiles to: j some_label

For conditional branching, Riscrithm uses an inline if/else ternary style. The compiler automatically maps your logic to beq, bne, blt, or bge, and will even swap registers dynamically to handle > and <=.

if foo == bar @true_block else @false_block
if foo > baz @greater_block else @lesser_block

Loops (Infinite and Conditional)

Riscrithm doesn't have a dedicated while or for keyword because you don't need them. You build loops the old-school way using labels, conditionals, and jumps. An Infinite Loop:

infinite_loop:
    foo ++
    @infinite_loop

A Conditional Loop:

load foo = 0
load bar = 10

loop_start:
    if foo == bar @loop_end else @loop_body

loop_body:
    foo ++
    @loop_start

loop_end:
    halt

Riscrithm supports immediate assignments and compound math shorthands. The compiler is smart enough to append the i suffix (e.g., addi, xori) when it detects you are working with an immediate integer instead of a register.

  • Load/Move: load foo = 100, move bar = foo
  • Math: foo += 5, bar *= baz, foo <<= 2
  • Increments: foo ++ (addi foo, foo, 1), bar -- (addi bar, bar, -1)

The ^^ Shorthand: Want to clear a register fast? Use the XOR-self operator ^^. foo ^^ translates to xor foo, foo, foo, immediately zeroing out the register.

Need to swap two registers without a temporary third register? The swap command uses a non-destructive triple-XOR sequence:

Translates to:

xor foo, foo, bar
xor bar, foo, bar
xor foo, foo, bar

7. Memory Operations (Stack & Heap)

Memory interaction requires strict data width extensions: .b (byte/8-bit), .w (word/32-bit), or .d (double-word/64-bit).

Stack commands automatically adjust the hardware stack pointer (sp) by the correct byte offset.

  • Push (->): foo -> stack.w (Decrements sp by 4, stores word)
  • Pop (<-): bar <- stack.d (Loads double-word, increments sp by 8)
  • Peek (=): baz = stack.b (Loads byte without moving sp)

Heap commands require you to provide a base address register using the & pointer syntax.

  • Store (->): foo -> heap.w from &bar (Stores word from foo into address at bar)
  • Load (<-): baz <- heap.b from &foo (Loads byte into baz from address at foo)

8. Compound Snippet Example

Here is what a cohesive block of Riscrithm looks like with these features combined:

main:
    # Setup
    load foo = 10
    load bar = 20
    baz ^^

    # Math and Memory
    foo += 5
    foo -> stack.w
    bar *= foo
    baz <- heap.w from &bar

    # Branching
    if foo != bar @continue else @fail

continue:
    foo swap bar
    halt

fail:
    trap

9. The Compiler Architecture & Optimizer (-o / --optimize)

Let's clear something up: Riscrithm is not some bloated, complex multi-pass optimization engine. It operates on a lightning-fast two-pass system:

  1. Pass 1 (Sanitization): The compiler reads the source file, strips out all comments, standardizes the whitespace, and verifies the strict indentation rules. It gets the raw text completely clean.
  2. Pass 2 (Parse & Optimize): This is where the magic happens in a single pass. It parses the instructions, replaces macros, expands shorthands, and—if the -o flag is active—applies optimizations on the fly before writing the assembly. When you compile with -o or --optimize, this second pass applies a lightweight AST sweep that cleans up your code in three distinct ways:
  • Dead Assignment Elimination: Consecutive duplicate modifications or redundant load/move sequences to the same register are discarded. (e.g., calling load foo = 128 twice in a row results in only one instruction).
  • Identity Math Elimination: Mathematical operations that leave the value completely unchanged are dropped entirely if the destination matches the source register (e.g., foo = foo + 0 or bar = bar / 1 are deleted).
  • Strength Reduction (Bitwise Folding): Multiplication and division are computationally expensive. If the optimizer catches you multiplying or dividing by a static power of two, it intercepts the instruction and rewrites it as a highly efficient bit-shift.
    • foo = bar * 2 translates to slli foo, bar, 1 (Shift Left Logical).
    • baz = foo / 8 translates to srli baz, foo, 3 (Shift Right Logical).

10. Clean, Ready-to-Use Output

One of the best parts about Riscrithm is that the assembly file it spits out isn't an unreadable mess. The output .s file is automatically pretty-printed. Instructions inside blocks are neatly indented, labels sit flush to the margin, and the entire structure is completely human-readable. You can take the generated assembly and drop it directly into your hardware simulator, debugger, or desktop workflow without formatting a thing. Enjoy writing assembly without the headache. Happy coding!

11. Roadmap: What’s Brewing for v1.1.0?

​Let’s be real—building a language alone is an iterative grind. While my current two-pass compiler engine handles the heavy lifting by separating symbol resolution from code generation, I am already actively breaking things behind the scenes to bring you a much more robust DX. ​Here is what I am cooking up for the v1.1.0 release: ​Proper Module Imports: Right now, splitting code across multiple files is a headache. I am working on a dedicated import system so you can natively break your codebase down into clean, reusable modules without breaking the build pipeline. ​Better Error Handling: I know the current compiler diagnostics can be... cryptic. The next minor release will introduce accurate line/column tracking and actual, human-readable error messages instead of just blowing up your terminal. ​Guard Clauses & Simple if Statements: You shouldn't be forced to write an empty else block just to satisfy the parser. I am updating the AST to natively support standalone if branches for cleaner, early-return guard patterns. ​Contribution & Feedback ​Have ideas for the syntax, or found an edge-case bug that completely broke the register allocation? Open an issue or drop a PR. This project is built by a developer, for developers—let's make it better together.

联系我们 contact @ memedata.com