与 Markdown 相比，我更喜欢 rST

与 Markdown 相比，我更喜欢 rST
I prefer rST to Markdown

原始链接: https://buttondown.email/hillelwayne/archive/why-i-prefer-rst-to-markdown/

作者最近发布了他的编程指南的更新版本（v0.2），名为“程序员的逻辑”。它现在支持 ePub 格式，包含有关约束求解和形式规范的部分，并提供其他功能。下载指南[此处]（[链接]）。作者在以同样的方式出版了上一本书“学习 TLA+”之后，使用称为“Sphinx”的系统编写了这本书。他没有使用 Markdown，而是选择了 reStructuredText (rST) 格式，尽管其学习曲线陡峭，因为它能够表示抽象文档树，而不仅仅是 HTML 的轻量级版本。事实上，与 Markdown 相比，rST 中的图像和其他元素需要单独的处理阶段，在 Markdown 中，它们被解析器本身视为特殊情况。此外，rST 允许添加新的文本对象，例如带有标题的图形，这与需要手动插入 HTML 的基本 Markdown 不同。另一个优点在于能够通过预渲染转换来修改最终文档结构——这是“程序员逻辑”中广泛使用的功能。例如，指南中除了问题之外还包括练习的解决方案。然而，为了便于访问和组织，这些解决方案出现在本书的末尾。因此，自定义转换过程将相关解决方案节点移离其初始位置，并在文档中相应地链接它们。此外，此方法还可以无缝创建本书的免费样本版本，该版本仅包含选定的解决方案，而不会影响整个指南。最后，虽然 rST 的复杂语法可能会让一些人望而却步，但作者赞扬了它相对于 Markdown 等简单系统的整体优越性，并指出缺乏统一的扩展语法和缺乏本机支持的预转换功能是后者固有的缺点。

Markdown 是一种简单的标记语言，用于将纯文本转换为 HTML。它最初于 2004 年开发，由于其最少的语法和易用性而受到软件开发人员的欢迎。尽管当时有多种更好的选项可用于将纯文本转换为 HTML，但开发人员仍将 Markdown 整合到了整个应用程序中，例如内容管理系统 (CMS)、生产力应用程序和文档管理器。然而，Markdown 的目的不仅仅是简单的格式化。它的目的是保持可读性，无论是以纯文本形式呈现还是以 HTML 重新格式化。该语法有意限制格式化选项的范围以降低复杂性，从而可以轻松地适应内存，而不需要额外的工具栏来使用。由于其简单性，它适用于评论字段、聊天程序、版本控制系统 (VCS) 消息、博客和文章帖子等任务。虽然 Markdown 不适合为企业产品创建大量文档，但对于这些目的仍然很有价值。如今，它已广泛跨平台使用，通常作为各种应用程序的基础。它的广泛采用和多功能性使其成为多个在线社区的普遍选择，尤其是寻求稍微增强其创作的程序员。然而，不一致的实现可能会导致意外的边缘情况，导致用户感到沮丧，这一点在 Slack 对 Markdown 的处理等实例中就很明显。总体而言，Markdown 是内容创建者的实用且适应性强的选项，并提供了一种用户友好的方法来向纯文本添加样式和组织。

原文

July 31, 2024

I will never stop dying on this hill

I just published a new version of Logic for Programmers! v0.2 has epub support, content on constraint solving and formal specification, and more! Get it here.

This is my second book written with Sphinx, after the new Learn TLA+. Sphinx uses a peculiar markup called reStructured Text (rST), which has a steeper learning curve than markdown. I only switched to it after writing a couple of books in markdown and deciding I needed something better. So I want to talk about why rst was that something.

Why rst is better

The most important difference between rst and markdown is that markdown is a lightweight representation of html, while rst is a midweight representation of an abstract documentation tree.

It's easiest to see this with a comparison. Here's how to make an image in markdown:

Technically, you don't even need a parser for this. You just need a regex to transform it into <img alt="alttext" src="example.jpg"/>. Most modern markdown engines do parse this into an intermediate representation, but the essence of markdown is that it's a lightweight html notation.

Now here's how to make an image in rst:

.. image:: example.jpg
  :alt: alttext

.. image:: defines the image "directive". When Sphinx reads it, it looks up the registered handler for the directive, finds ImageDirective, invokes ImageDirective.run, which returns an image_node, which is an object with an alt field containing "alttext". Once Sphinx's processed all nodes, it passes the whole doctree to the HTML Writer, which looks up the rendering function for image_node, which tells it to output an <image> tag.

Whew that's a mouthful. And for all that implementation complexity, we get… an interface that has 3x the boilerplate as markdown.

On the other hand, the markdown image is hardcoded as a special case in the parser, while the rst image is not. It was added in the exact same way as every other directive in rst: register a handler for the directive, have the handler output a specific kind of node, and then register a renderer for that node for each builder you want.

This means you can extend Sphinx with new text objects! Say you that instead of an <image>, you want a <figure> with a <figcaption>. In basic markdown you have to manually insert the html, with Sphinx you can just register a new figure directive. You can even make your FigureDirective subclass ImageDirective and have it do most of the heavy lifting.

The second benefit is more subtle: you can transform the doctree before rendering it. This is how Sphinx handles cross-referencing: if I put a foo anchor in one document and :ref:`image <foo>` in another, Sphinx will insert the right URL during postprocessing. The transformation code is also first-class with the rest of the build process: I can configure a transform to only apply when I'm outputting html, have it trigger in a certain stage of building, or even remove a builtin transform I don't want to run.

Now, most people may not need this kind of power! Markdown is ubiquitous because it's lightweight and portable, and rst is anything but. But I need that power.

One use case

Logic for Programmers is a math-adjacent book, and all good math books need exercises for the reader. It's easier to write an exercise if I can put it and the solution right next to each other in the document. But for readers, I want the solutions to show up in the back of the book. I also want to link the two together, and since I might want to eventually print the book, the pdfs should also include page references. Plus they need to be rendered in different ways for latex (pdf) output and epub output. Overall lots of moving parts.

To handle this I wrote my own exercise extension.

.. in chapter.rst
.. exercise:: Fizzbuzz
  :name: ex-fizzbuzz

  An exercise

.. solution:: ex-fizzbuzz

  A solution

.. in answers.rst

.. solutionlist::

How these nodes are processed depends on my compilation target. I like to debug in HTML, so for HTML it just renders the exercise and solution inline.

When generating epub and latex, though, things works a little differently. After generating the whole doctree, I run a transform that moves every solution node from its original location to under solutionlist. Then it attaches a reference node to every exercise, linking it to the new solution location, and vice versa. So it starts like this (using Sphinx's "pseudoxml" format):

-- chapter.rst
<exercise_node ids="ex-fizzbuzz">
  <title>
    Fizzbuzz
  <paragraph>
    An exercise
<solution_node ids="ex-fizzbuzz-sol">
  <paragraph>
    A solution

-- answers.rst
<solutionlist_node>

And it becomes this:

-- chapter.rst
<exercise_node ids="ex-fizzbuzz">
  <title>
    Fizzbuzz
  <paragraph>
    An exercise
    <exsol_ref_node refuri="/path/to/answers#ex-fizzbuzz-sol">
      Solution

-- answers.rst
<solutionlist_node>
  <solution_node ids="ex-fizzbuzz-sol">
    <paragraph>
      A solution
      <exsol_ref_node refuri="/path/to/chapter#ex-fizzbuzz">
        (back)

The Latex builder renders this by wrapping each exercise and solution in an answers environment, while the epub builder renders the solution as a popup footnote.^{Making this work:}

An example of solution popups on an epub reader

It's a complex dance of operations, but it works enormously well. It even helps with creating a "free sample" subset of the book: the back of the free sample only includes the solutions from the included subset, not the whole book!

"But I hate the syntax"

When I gush about rST to other programmers, this is the objection I hear the most: it's ugly.

To which I say, are you really going to avoid using a good tool just because it makes you puke? Because looking at it makes your stomach churn? Because it offends every fiber of your being?

...Okay yeah that's actually a pretty good reason not to use it. I can't get into lisps for the same reason. I'm not going to begrudge anybody who avoids a tool because it's ugly.

Maybe you'd find asciidoc more aesthetically pleasing? Or MyST? Or Typst? Or Pollen? Or even pandoc-extended markdown? There are lots of solid document builders out there! My point isn't that sphinx/rst is exceptionally good for largescale documentation, it's that simple markdown is exceptionally bad. It doesn't have a uniform extension syntax or native support for pre-render transforms.

This is why a lot of markdown-based documentation generators kind of hack on their own preprocessing step to support new use-cases, which works for the most part (unless you're trying to do something really crazy). But they have to work around the markdown, not in it, which limits how powerful they can be. It also means that most programmer tooling can't understand it well. There's LSP and treesitter for markdown and rst but not for gitbook-markdown or md-markdown or leanpub-markdown.

But if you find a builder that uses markdown and satisfies your needs, more power to you! I just want to expose people to the idea that doc builders can be a lot more powerful than they might otherwise expect.

No newsletter next week

I'll be in Hong Kong.

Update 2024-07-31

Okay since this is blowing up online I'm going to throw in a quick explanation of Logic for Programmers for all of the non-regulars here. I'm working on a book about how formal logic is useful in day-to-day software engineering. It starts with a basic rundown of the math and then goes into eight different applications, such as property testing, database constraints, and decision tables. It's still in the alpha stages but already 20k words and has a lot of useful content. You can find it here. Reader feedback highly appreciated!

If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.