Hey there. If you're looking at this, you are probably getting your hands dirty with Riscrithm, a high-level macro-assembly dialect that compiles straight down to pure RISC-V assembly. Think of it as a bridge between the readability of a high-level language and the raw, deterministic control of bare-metal hardware. Let's dive straight into how the compiler works, the syntax rules, and what's happening under the hood.
To compile your source code, you'll use the riscrithm CLI tool. The syntax is straightforward:
riscrithm "source_code_file" "assembly_target_file" [-o/--optimize]
- Source Code: Your Riscrithm input file.
- Target File: The generated .s assembly file. If this file doesn't exist, the compiler will create it for you on the fly.
- Optimization: Pass -o or --optimize to enable the optimization sweep (more on the compiler architecture later).
Every Riscrithm file must declare its target section and entrypoint at the very top. These, along with macro definitions, are the only lines allowed to exist completely unindented outside of a label block.
- header : Sets the assembly section. For instance, header default translates to .section .text.
- entrypoint : Defines where the program starts. Passing entrypoint main translates to .globl main.
header default
entrypoint main
You can define text-replacement macros using the define keyword. This is perfect for aliasing registers or creating single-line inline functions. Here are some classic developer examples:
define foo = x1
define bar = x2
define baz = x3
define horseBattery = x4
define apple = 10
define orange = 20
define clearFoo = foo ^^
Whenever the parser sees foo, it swaps it with x1 before processing any actual logic.
Comments are written using the # symbol. The compiler strips out anything following a # on any line, so you can place them anywhere safely.
Riscrithm is strictly scoped via indentation.
Labels define your execution blocks and must end with a colon. They must not have any indentation. Conversely, every instruction inside a label must be indented (spaces or tabs). If you leave an instruction unindented, the compiler will throw a SyntaxError.
main:
load foo = apple
move bar = foo
If you need to bypass the Riscrithm preprocessor and write raw RISC-V assembly, prefix your label with !!. The compiler strips the exclamation marks but passes everything inside that block completely untouched. Macros and shorthands will not expand here.
!!raw_block:
li x1, 10
foo ^^ # This stays exactly as written!
Here is the meat of the language. Riscrithm maps readable statements directly to hardware instructions.
Instead of remembering privilege-level opcodes, use explicit system calls:
| Riscrithm | RISC-V Assembly | Description |
|---|---|---|
| interrupt.u | uret | User-mode trap return |
| interrupt.s | sret | Supervisor-mode trap return |
| interrupt.m | mret | Machine-mode trap return |
| wait | wfi | Wait for interrupt (low-power state) |
| trap | ebreak | Debugger trap |
| halt | ecall | System environment call / halt |
| ... | nop | No-operation (ellipsis) |
Let’s talk about code style. To keep your Riscrithm source files readable and consistent, the compiler expects (and highly encourages) a clean split in how you name your identifiers. Here is the naming convention breakdown:
- Variables & Registers (camelCase): Any variable alias or register macro you define should start with a lowercase letter, with each subsequent word capitalized.
- Examples: firstNum, addressRegister, stackOffset
- Labels & Code Blocks (snake_case): Execution targets, loop boundaries, and conditional blocks use lowercase words separated by underscores. This makes them pop visually against instructions.
- Examples: loop_start, on_true, error_handler
- Constants & Literals (SCREAMING_SNAKE_CASE): Hardcoded configuration values, static offsets, or global definitions that shouldn't change use all-uppercase letters separated by underscores.
- Examples: DEFAULT_HEADER, MAX_BUFFER_SIZE, IMM_VALUE
Excluding the hardware system traps and conditional branching symbols, here is the complete table of mutators, arithmetic expressions, and memory operators supported by the single-pass compiler engine.
| Riscrithm Syntax | Category | Internal Expansion / Behavior | Target RISC-V Assembly |
|---|---|---|---|
| load = | Assignment | Direct immediate assignment | li reg, imm |
| move = | Assignment | Register-to-register copy | mv reg1, reg2 |
| swap | Value Exchange | Triple-XOR non-destructive swap | xor reg1, reg1, reg2 |
| xor reg2, reg1, reg2 | |||
| xor reg1, reg1, reg2 | |||
| -> stack.[b/w/d] | Stack Memory | Dec pointer, store byte/word/double | addi sp, sp, -offset |
| s[b/w/d] reg, 0(sp) | |||
| <- stack.[b/w/d] | Stack Memory | Load byte/word/double, inc pointer | l[b/w/d] reg, 0(sp) |
| addi sp, sp, offset | |||
| = stack.[b/w/d] | Stack Memory | Peek value from top of stack | l[b/w/d] reg, 0(sp) |
| <- heap.[b/w/d] from & | Heap Memory | Base-register memory read (load) | l[b/w/d] reg1, 0(reg2) |
| -> heap.[b/w/d] from & | Heap Memory | Base-register memory write (store) | s[b/w/d] reg1, 0(reg2) |
This section covers basic math operations, self-mutators, and compound shorthands. Remember, the compiler automatically realigns regular operations to their immediate equivalent (addi, andi, etc.) if the right-hand side is an integer literal.
| Riscrithm Syntax | Operator Type | Evaluated Expression |
|---|---|---|
| ++ | Self Operator | = + 1 |
| -- | Self Operator | = - 1 |
| ^^ | Self Operator | = ^ (Fast Register Clear) |
| += | Compound Tag | = + |
| -= | Compound Tag | = - |
| *= | Compound Tag | = * |
| /= | Compound Tag | = / |
| %= | Compound Tag | = % |
| <<= | Compound Tag | = << |
| >>= | Compound Tag | = >> |
| = + | Base Arithmetic | Addition (Supports immediate realignment) |
| = - | Base Arithmetic | Subtraction (Supports immediate realignment) |
| = & | Base Arithmetic | Bitwise AND (Supports immediate realignment) |
| = | Base Arithmetic | |
| = ^ | Base Arithmetic | Bitwise XOR (Supports immediate realignment) |
| = << | Base Arithmetic | Logical Shift Left (Supports immediate realignment) |
| = >> | Base Arithmetic | Logical Shift Right (Supports immediate realignment) |
| = * | Base Arithmetic | Hardware Multiplication (M-Extension) |
| = / | Base Arithmetic | Hardware Division (M-Extension) |
| = % | Base Arithmetic | Hardware Remainder (M-Extension) |
To unconditionally jump, use the @ symbol:
@some_label # Compiles to: j some_label
For conditional branching, Riscrithm uses an inline if/else ternary style. The compiler automatically maps your logic to beq, bne, blt, or bge, and will even swap registers dynamically to handle > and <=.
if foo == bar @true_block else @false_block
if foo > baz @greater_block else @lesser_block
Riscrithm doesn't have a dedicated while or for keyword because you don't need them. You build loops the old-school way using labels, conditionals, and jumps. An Infinite Loop:
infinite_loop:
foo ++
@infinite_loop
A Conditional Loop:
load foo = 0
load bar = 10
loop_start:
if foo == bar @loop_end else @loop_body
loop_body:
foo ++
@loop_start
loop_end:
halt
Riscrithm supports immediate assignments and compound math shorthands. The compiler is smart enough to append the i suffix (e.g., addi, xori) when it detects you are working with an immediate integer instead of a register.
- Load/Move: load foo = 100, move bar = foo
- Math: foo += 5, bar *= baz, foo <<= 2
- Increments: foo ++ (addi foo, foo, 1), bar -- (addi bar, bar, -1)
The ^^ Shorthand: Want to clear a register fast? Use the XOR-self operator ^^. foo ^^ translates to xor foo, foo, foo, immediately zeroing out the register.
Need to swap two registers without a temporary third register? The swap command uses a non-destructive triple-XOR sequence:
Translates to:
xor foo, foo, bar
xor bar, foo, bar
xor foo, foo, bar
Memory interaction requires strict data width extensions: .b (byte/8-bit), .w (word/32-bit), or .d (double-word/64-bit).
Stack commands automatically adjust the hardware stack pointer (sp) by the correct byte offset.
- Push (->): foo -> stack.w (Decrements sp by 4, stores word)
- Pop (<-): bar <- stack.d (Loads double-word, increments sp by 8)
- Peek (=): baz = stack.b (Loads byte without moving sp)
Heap commands require you to provide a base address register using the & pointer syntax.
- Store (->): foo -> heap.w from &bar (Stores word from foo into address at bar)
- Load (<-): baz <- heap.b from &foo (Loads byte into baz from address at foo)
Here is what a cohesive block of Riscrithm looks like with these features combined:
main:
# Setup
load foo = 10
load bar = 20
baz ^^
# Math and Memory
foo += 5
foo -> stack.w
bar *= foo
baz <- heap.w from &bar
# Branching
if foo != bar @continue else @fail
continue:
foo swap bar
halt
fail:
trap
Let's clear something up: Riscrithm is not some bloated, complex multi-pass optimization engine. It operates on a lightning-fast two-pass system:
- Pass 1 (Sanitization): The compiler reads the source file, strips out all comments, standardizes the whitespace, and verifies the strict indentation rules. It gets the raw text completely clean.
- Pass 2 (Parse & Optimize): This is where the magic happens in a single pass. It parses the instructions, replaces macros, expands shorthands, and—if the -o flag is active—applies optimizations on the fly before writing the assembly. When you compile with -o or --optimize, this second pass applies a lightweight AST sweep that cleans up your code in three distinct ways:
- Dead Assignment Elimination: Consecutive duplicate modifications or redundant load/move sequences to the same register are discarded. (e.g., calling load foo = 128 twice in a row results in only one instruction).
- Identity Math Elimination: Mathematical operations that leave the value completely unchanged are dropped entirely if the destination matches the source register (e.g., foo = foo + 0 or bar = bar / 1 are deleted).
- Strength Reduction (Bitwise Folding): Multiplication and division are computationally expensive. If the optimizer catches you multiplying or dividing by a static power of two, it intercepts the instruction and rewrites it as a highly efficient bit-shift.
- foo = bar * 2 translates to slli foo, bar, 1 (Shift Left Logical).
- baz = foo / 8 translates to srli baz, foo, 3 (Shift Right Logical).
One of the best parts about Riscrithm is that the assembly file it spits out isn't an unreadable mess. The output .s file is automatically pretty-printed. Instructions inside blocks are neatly indented, labels sit flush to the margin, and the entire structure is completely human-readable. You can take the generated assembly and drop it directly into your hardware simulator, debugger, or desktop workflow without formatting a thing. Enjoy writing assembly without the headache. Happy coding!
Let’s be real—building a language alone is an iterative grind. While my current two-pass compiler engine handles the heavy lifting by separating symbol resolution from code generation, I am already actively breaking things behind the scenes to bring you a much more robust DX. Here is what I am cooking up for the v1.1.0 release: Proper Module Imports: Right now, splitting code across multiple files is a headache. I am working on a dedicated import system so you can natively break your codebase down into clean, reusable modules without breaking the build pipeline. Better Error Handling: I know the current compiler diagnostics can be... cryptic. The next minor release will introduce accurate line/column tracking and actual, human-readable error messages instead of just blowing up your terminal. Guard Clauses & Simple if Statements: You shouldn't be forced to write an empty else block just to satisfy the parser. I am updating the AST to natively support standalone if branches for cleaner, early-return guard patterns. Contribution & Feedback Have ideas for the syntax, or found an edge-case bug that completely broke the register allocation? Open an issue or drop a PR. This project is built by a developer, for developers—let's make it better together.