用 Rust 编写 Asciidoc 解析器:Asciidocr
Writing an Asciidoc Parser in Rust: Asciidocr

原始链接: https://www.bikesbooksandbullshit.com/bullshit/2025/01/08/writing-an-asciidoc-parser-in-rust.html

出于对创建定制工具的渴望,作者开始用Rust构建一个Asciidoc解析器,恰如其分地命名为“asciidocr”。 他长期使用Asciidoc,但发现现有的转换器(特别是流行的基于Ruby的Asciidoctor)由于Ruby的性能以及他对该语言的反感而存在不足。 最初探索Go语言,但他很快因风格偏好而放弃了它。 然而,Rust证明是完美的契合点。 尽管缺乏正式的计算机科学培训,但他接受了从头开始构建解析器的挑战——放弃了自动词法分析工具,转而通过指导进行更深入的学习体验。 该项目是一项宝贵的教育练习,使他能够在解决对更快、更可扩展的Asciidoc转换器的实际需求的同时学习解析器构造。 虽然承认仍有改进的空间,但作者对完成的“asciidocr”工具表示由衷的自豪。

## Asciidocr:一个新的基于 Rust 的 Asciidoc 解析器 一个名为“Asciidocr”的新 Asciidoc 解析器,使用 Rust 编写,已经发布并被 Hacker News 讨论。该项目引发了关于在 Pandoc(一个通用的文档转换器)存在的情况下构建新解析器的理由的讨论。 一个关键点是,与 Asciidoc 等一些源格式的丰富性相比,Pandoc 的内部文档表示可能存在限制。这表明 Asciidocr 旨在更忠实和完整地解析 Asciidoc 的特性。 即使是那些正在探索 Typst 等替代方案的人,也欢迎该项目作为 Asciidoc 生态系统中的积极发展。用户还注意到相关的有趣项目 HTMLBook,它是 HTML 的一个以文档为中心的子集。 许多评论员强调了 Asciidoc 作为全面文档解决方案的优势。
相关文章

原文

I really only ever make something when I want something to exist that doesn’t already, or when I want something that does exist to more readily suit my (admittedly) idiosyncratic needs or thoughts about how it should exist. For better or worse, I have a lot of wants, and so I make a lot of things (e.g., Two Page Tuesday, or last night’s mostly-successful attempt at tapering a pair of pants I got at Global Thrift, or an early solve for the problem I’m solving here).

So: I wrote an asciidoc parser in Rust. I called it asciidocr because the Command-Line Rust book put an r after all the "clone a UNIX tool" projects, and I liked that convention.

Asciidoc is a lightweight markup language that is, in my opinion, the best one. Why it’s the best one is a separate issue entirely, but we can at least safely assume that it’s a good one, and the one that, for better or worse, I’ve been using to write nearly everything I’ve written for personal or professional use in the last five years or so. While it started as a Python project, it got new life (and a bunch of new features) when it was more or less taken over by the fine Asciidoctor folks, who wrote their converter in Ruby. It works very well, and does a lot of things. But.

It’s in Ruby, a language I have petty beef with and, more importantly, is an interpreted, not compiled language, which means that for every new machine I want to convert asciidoc files on, I need to install Ruby. And there are some other things to, in part pertaining to the way that templates must be written for custom output(s), it’s frankly a little slow, and whatever else.

But mostly it was the "I don’t want to have to write Ruby to extend the thing" that got me thinking. I was dreaming about a text-based writing management tool (like a Scrivener but for folks who use vim), and having already written a tool to make generating PDFs from asciidoc easier, I knew that if I wanted to write this next app in anything but Ruby, I’d need to either (a) subprocess out to the Ruby; (b) rely on the old asciidoc.py project, with its limitations (and also therefore limiting myself to writing in Python, which, like Ruby, means that if I wanted to share my tool, the folks using it would need to be able to install Python); or (c) find or build a converter in a different language. So after getting part of the way through an (a) implementation in Python, I cut my losses and started looking more readily into option (c), for: I was learning Rust and Go(lang).

There is, in fact, a pretty good Go implementation of an asciidoc parser/converter. And there was a hot second when it looked like my company might transition to Go for some backend stuff, so I picked up Powerful Command-Line Applications in Go and got to work. Unfortunately I realized pretty quickly that I am allergic to the following, oft-repeated pattern in the language:

  if err != nil {
    return err
  }

And then it became clear that we weren’t going to be using Go at work, so I dropped it.

Rust, on the other hand: boy-howdy did I love (and still do) working in that. And sure, there wasn’t a very feature-complete asciidoc parser or converter yet, but I liked the language and figured I could learn something: so I asked for some mentorship (thanks big time to Kit Dallege for everything that follows) and got to work.

My background is, of course, very humanities-focused. I mean, sure, there was a math minor in there somewhere, but that was all in service of a brief glimmer of a future doing philosophy of math, so. I’ve written a lot of code, and have been writing some kind of code or other since I was a small kid (thank you, hackable Geocities sites), but I have no "computer science education." Learning how to write a parser seemed like a good way to go.

And instead of relying on a lexing package (e.g., something like pest), where you write a grammar and the thing does it for you, Kit recommended I do the whole thing by hand, since I’d learn more (and potentially it could be faster, or at least a smaller binary).

So that’s more or less what I did. It’s not perfect; it could, of course, be improved; there are some decisions I made early on that I would not make today, knowing what I know how; and I am very fucking proud of it. So we can dig in.

联系我们 contact @ memedata.com