反向括号

反向括号
Inverse Parentheses

原始链接: https://kellett.im/a/inverse-parentheses

本文探讨了编程语言中的“反分组”概念——使用括号解组操作数的能力，大多数语言只允许分组，而缺乏此功能。作者通过超越传统的解析方法来应对这一挑战。认识到纯粹的解析器解决方案不太可能实现，他们从Python处理缩进的方式中获得灵感。提出的解决方案是修改*分词器*，以跟踪括号嵌套深度并为每个标记分配“友好度”分数。该分数有效地嵌入了优先级信息，从而使简化的解析器能够处理分组和反分组。虽然这引入了无限多个优先级级别，但使用“优先级爬升解析器”有效地管理它们。由此产生的系统可以正确解析非常规表达式，例如 `1 + (2 * 3)` 和 `(1 + 2) * 3`。作者最后建议进一步研究优化表达式处理，并探索可逆反分组解析器的可能性。

最近 Hacker News 上有一篇帖子讨论了名为“Inverse Parentheses”（kellett.im）的博客文章。但评论区大多表达了困惑和沮丧。许多人认为这篇文章缺乏实质内容，没有解释其核心概念——用括号“解组”——除了一个修辞性问题之外。有几位用户报告在使用 Brave 和 Safari 等浏览器在 macOS 上出现滚动问题，认为这是网站上的 CSS 代码（`overflow: hidden scroll`）造成的。Firefox 似乎能正确渲染该页面。讨论主要集中在文章缺乏内容和技术显示问题上，而不是对“反向括号”概念本身的争论，这可能是由于其不清晰的呈现方式所致。

原文

Have you ever noticed that lots of programming languages let you use parentheses to group operands, but none use them to ungroup them? No? Well let’s pretend this is a normal thing to be thinking about, and see what we can do about it.

Grouping with parentheses is relatively easy to add to a language grammar. The rule that accepts atomic things like 37 simply needs to also accept an opening paren,^{at which point it will recursively parse an entire expression, and then eat a closing paren.}

def parse_atom(lex):
    r = next(lex)
    if r[0] == 'integer':
        return int(r[1])
    elif r[0] == '(':
        expr = parse_expr(lex)
        s = next(lex)
        if s[0] != ')':
            raise ParseError("missing close paren")
        return expr
    else:
        raise ParseError(f"unexpected {r[0]}")

Anti-grouping isn’t quite as straightforward. Our parser can’t follow the structure of the parentheses, because then it wouldn’t be following the structure of the expression—the whole point is that these are dissimilar.

I don’t know if it’s possible to write a pure parser that does this. But purity is overrated anyway. I decided to take inspiration from another language with a weird grouping system.

Python

Did you know that Python’s grammar has braces? You just don’t type them. The tokeniser^{keeps track of the indentation level and inserts special tokens when it changes. The parser itself doesn’t need to worry about counting whitespace; it just sees blocks of statements bracketed by INDENT and DEDENT,^{which are easy to parse.}}

As it happens, Python’s tokeniser also knows when it’s inside a parenthesised expression. Indentation inside parens is not significant, and this is implemented by tracking the paren nesting depth and suppressing INDENT and DEDENT while it’s non-zero.

What if we used the same trick? Instead of trying to do all this in the parser somehow, the tokeniser could track its nesting depth, and emit a “friendliness” score for each token. Then we can simply parse operators in ascending order of friendliness.

In this model 1 + (2 * 3) will yield the following token stream:

We’ll leave the parentheses in the token stream, but all the parser needs to do with them is generate a syntax error if it finds one in the wrong place. Grouping will be handled entirely by the precedence levels embedded in the token stream.

A not-so-infinite climb

The tokeniser hack solves our parsing problem, but it creates another one: our language now has infinitely many precedence levels. I don’t feel like trying to do that with handrolled recursive descent, but a rummage through school textbooks suggests a precedence climbing parser is what we need. It deals with operators in the order it meets them, so having infinitely many possible precedences won’t bother it.

I hacked this together and it’s appropriately silly:

> (1 + 2) * 3
1 + 2 * 3
> 1 + (2 * 3)
(1 + 2) * 3

Something I particularly enjoy about the implementation I landed on is that if you increase friendliness instead of decreasing it, you end up with an ordinary^{parser. It’s also a good platform for other questionable syntactic innovations, like a language with no parentheses at all, using whitespace to weaken binding.}

Future work

While we’ve achieved a lot here today,^{we’ve also raised some important new questions. For instance, is it always necessary to double-parenthesise expressions in more complex cases?}

> ((1 * 2)) + (3 * 4)
1 * ((2 + 3) * 4)

And is it possible to have an anti-grouping parser that gives an involution when hooked up to an ordinary printer?

These are promising avenues for deeper study, and I’d love to hear from anyone who chooses to take them on.

反向括号 Inverse Parentheses

Python

A not-so-infinite climb

Future work

反向括号
Inverse Parentheses