不可能的优化，以及实现它的元编程

不可能的优化，以及实现它的元编程
The Impossible Optimization, and the Metaprogramming to Achieve It

原始链接: https://verdagon.dev/blog/impossible-optimization

这种优化将正则表达式匹配过程从运行时决策转换为编译时决策，在 Mojo 中实现。代码不再在程序执行期间确定正则表达式节点类型，而是利用编译时参数和泛型。具体来说，`nodes` 和 `node_idx`——之前是运行时参数——现在是 `_match_node` 函数中的编译时参数。一个 `@parameter if` 语句在编译时求值，有效地为正则表达式中的*每个*节点索引创建 `_match_node` 的一个专门版本。这导致该函数有 28 个不同的实例化，每个版本都针对特定的节点进行了定制。这种方法消除了运行时检查，例如 `node.isa[LiteralNode]` 和节点查找，因为这些决策是在编译期间做出的。编译器为每个节点预先计算必要的逻辑，从而通过减少开销并实现进一步的优化来显著提高性能。本质上，代码在编译时被“展开”，为正则表达式的每个部分创建一个高度高效、专门的匹配函数。

## 不可能的优化与元编程 verdagon.dev 上的一篇文章探讨了一种极致优化的技术——在编译时将正则表达式直接编译成代码，而不是依赖传统的正则表达式引擎。这允许编译器应用标准优化，可能带来显著的速度提升。讨论的重点是能够进行这种“编译时代码执行”的语言，包括 Mojo、Julia、D、Nim、Zig 和 C++（自 C++20 起）。许多评论者指出，类似的方法已经存在了几十年，并提到了 Lisp 宏、Lex/Flex 和 Scala LMS 等工具。核心思想是通过基于特定输入“展开”代码来避免运行时开销，但这可能会导致代码量增加。讨论还包括了用于管理生成代码大小的优化方法，例如共享常见的 AST 部分。一个实际的例子是通过快速检查 Gmail 等常见域名来优化电子邮件验证。此次讨论凸显了简单、潜在有效的优化与维护高效且可扩展代码生成之间更广泛的挑战之间的张力。

原文

Step 2: Make Regex a compile-time parameter, use compile-time operations

We just created this Regex at compile time.

Do we read it at run-time? Do we put it into a global or something?

No! We're going to only read it at compile-time.

"But Evan, we need to read the Regex at run-time! Otherwise, how does the run-time program even know what to do? How does it know what functions to call, what branches to take, etc.?"

Great question! The answer is that we can make all those decisions at compile time.

I know that doesn't make any sense, but bear with me.

Here's how we change our program.

Before:

fn _match_node(
    nodes: List[RegexNode],
    node_idx: Int,
    text: String,
    start_pos: Int
) -> MatchResult:
    var node = nodes[node_idx]
    if node.isa[LiteralNode]():
        ref literal_node = node[LiteralNode]
        return _match_literal(nodes, literal_node, text, start_pos)
    elif node.isa[RepeatNode]():
        ref repeat_node = node[RepeatNode]
        return _match_repeat(nodes, repeat_node, text, start_pos)
    elif ...

After:

fn _match_node[nodes: List[RegexNode], node_idx: Int](
    text: String,
    start_pos: Int
) -> MatchResult:
    alias node = nodes[node_idx]
    @parameter
    if node.isa[LiteralNode]():
        alias literal_node = node[LiteralNode]
        return _match_literal[nodes, literal_node](text, start_pos)
    elif node.isa[RepeatNode]():
        alias repeat_node = node[RepeatNode]
        return _match_repeat[nodes, repeat_node](text, start_pos)
    elif ...

There are two main changes:

The nodes and node_idx run-time parameters are now compile-time parameters, because they're inside [...]. We're using Mojo's generics system to hold these values at compile-time.
The @parameter if. @parameter means "at compile time" in Mojo-speak, so this if is run at compile-time, not run-time. This is like a constexpr if in C++ terms, or in C terms it's more like #define than if.

That means _match_node is now a generic function, a template.

There's only one nodes value (from the Regex), but there are 28 different node_idxs for this Regex. That means we'll have 28 different instantiations of this _match_node, one for each node_idx.

Previously, we had a very simple call tree, with only one _match_node, that was recursive. The call tree looked like this:

Now, we have 28 versions of _match_node, with a call tree that looks like this:

main

match[Regex instance]

_match_node[List(...), 0] (for whole expr)

_match_node[List(...), 1] (for first \w+)

_match_node[List(...), 2] (for \w)

_match_node[List(...), 3] (for (\+\w*)?)

_match_node[List(...), 4] (for \+\w*)

_match_node[List(...), 5] (for \+)
_match_node[List(...), 6] (for \w*)

_match_node[List(...), 7] (for \w)

_match_node[List(...), 8] (for @)
_match_node[List(...), 9] (for (\d+\.\d+\.\d+\.\d+|\w+\.\w+))

_match_node[List(...), 10] (for \d+\.\d+\.\d+\.\d+)

_match_node[List(...), 11] (for \d+)

_match_node[List(...), 12] (for \d)

_match_node[List(...), 13] (for \.)
_match_node[List(...), 14] (for \d+)

_match_node[List(...), 15] (for \d)

_match_node[List(...), 16] (for \.)
_match_node[List(...), 17] (for \d+)

_match_node[List(...), 18] (for \d)

_match_node[List(...), 19] (for \.)
_match_node[List(...), 20] (for \d+)

_match_node[List(...), 21] (for \d)

_match_node[List(...), 22] (for \w+\.\w+)

_match_node[List(...), 23] (for \w+)

_match_node[List(...), 24] (for \w)

_match_node[List(...), 25] (for \.)
_match_node[List(...), 26] (for \w+)

_match_node[List(...), 27] (for \w)

28 versions, each different only in their node_idx compile-time parameter.

To understand that better, let's look at the _match_node[List(...), 1] function for the first \w+.

When the compiler instantiates it as _match_node[List(...), 1], it would look like below. The @parameter if is evaluated at compile time, so only one of its branches is included. I'll leave them in as comments to illustrate:

fn _match_node[List(...), 1](  # Node 1 is a RepeatNode
    text: String,
    start_pos: Int
) -> MatchResult:
#   alias node = nodes[node_idx]
#   @parameter
#   if node.isa[LiteralNode]():
#       alias literal_node = node[LiteralNode]
#       return _match_literal[nodes, literal_node](text, start_pos)
#   elif node.isa[RepeatNode]():
#       alias repeat_node = node[RepeatNode]
        return _match_repeat[List(...), RepeatNode(2, 1, 0)](text, start_pos)
#   elif ...

The benefit here is that we don't have to do the node.isa[LiteralNode], node.isa[RepeatNode], etc. at run-time. That's pretty nice, it eliminates a bit of overhead.

It also executed the alias statements for us too, so we don't have to do those lookups (and bounds checks) at run-time. Another nice speedup!

We'll see the biggest benefit in the next section though.

不可能的优化，以及实现它的元编程 The Impossible Optimization, and the Metaprogramming to Achieve It

Step 2: Make Regex a compile-time parameter, use compile-time operations

不可能的优化，以及实现它的元编程
The Impossible Optimization, and the Metaprogramming to Achieve It