``` Markdown 正在拖你的后腿 ```
Markdown is holding you back

原始链接: https://newsletter.bphogan.com/archive/issue-45-markdown-is-holding-you-back/

## 超越 Markdown:技术内容结构化 Markdown 因其简洁易读而广受欢迎,成为开发者文档的首选,但对于大型、复杂的项目,它存在局限性。Markdown 侧重于内容*呈现方式*,而非内容*本身*,从而阻碍了机器可读性和内容重用。搜索引擎、LLM 和 IDE 都受益于语义标记,后者定义了“步骤”或“命令”等内容元素,而不仅仅是格式。 作者提倡超越 Markdown 的“隐式类型”——即通过推断来确定含义——转向提供明确结构的格式。替代方案包括: * **reStructuredText & AsciiDoc:** 平衡了表达能力和易用性,非常适合结构化文档站点。 * **DocBook (XML) & DITA:** 强大、基于 XML 的选项,适用于需要内容分发、重用和多渠道发布的超大型内容。 关键在于优先考虑丰富、语义化的源格式,并导出为 Markdown 以方便开发者查看。Markdown 适用于快速文档,但投资于结构化标记可以确保长期可维护性、改进机器理解以及更大的内容灵活性。

## Markdown 的局限性与替代方案 一篇最近的文章认为,Markdown 虽然因其简洁性而流行,但缺乏复杂文档所需结构能力,并且越来越多地通过 HTML 进行修补,从而违背了其初衷。虽然 Markdown 允许包含 HTML,但评论员指出,这降低了它的吸引力——目标是*避免*原始 HTML。 讨论强调了一个核心矛盾:Markdown 擅长快速、易读的内容,但在交叉引用、复杂样式和语义结构等功能方面却表现不足。AsciiDoc、reStructuredText 和 DocBook 等替代方案提供了更强大的功能,但同时也带来了更高的复杂性。较新的选项,如 Typst,正在获得关注,它在功能和现代工具之间取得了平衡。 许多人认为 Markdown 的优势在于其易访问性和纯文本可读性,而对其进行扩展往往会损害这些优势。最终,最佳工具取决于项目需求——简单的笔记与广泛的文档需要不同的方法。LLM 的兴起增加了另一层因素,因为机器可读性变得越来越重要,可能会促使用户转向更结构化的格式。然而,一些人认为 LLM 应该适应现有内容,而不是决定格式选择。
相关文章

原文

I've used many content formats over the years, and while I love Markdown, I run into its limitations daily when I work on larger documentation projects.

In this issue, you'll look at Markdown and explore why it might not be the best fit for technical content, and what else might work instead.

Markdown is everywhere. It's human-readable, approachable, and has just enough syntax to make docs look good in GitHub or a static site. That ease of use is why it's become the default choice for developer documentation. I'm using Markdown right now to write this newsletter issue. I love it.

But Markdown's biggest advantage is its biggest drawback: it doesn't describe the content like other formats can.

Think about how your content gets consumed. Your content isn't just for human readers. Machines use it too. Your content gets indexed by search engines, and parsed by LLMs, and those things parse the well-formed HTML your systems publish. Markdown's basic syntax only emits a small subset of the available semantic tags HTML allows.

IDE integrations can use your docs, too. And AI agents rely on structure to answer developer questions. If you're only feeding them plain-text Markdown documents to reduce the number of tokens you send, you're not providing as much context as you could.

Worse, when you want to reuse your content or syndicate content into another system, you quickly find out that Markdown is more of the lowest common denominator than a source of truth, as not all Markdown flavors are the same.

There are other options you can use that give you more control. But first, let's look deeper into why you should move away from Markdown for serious work.

Markdown is "implicit typing" for content

If you're a developer, you know all about type systems in programming languages. Some languages use Implicit typing, in which the compiler or interpreter infers the data type from the value. These languages give you flexibility, but no guarantees. That's why many developers prefer languages that use explicit typing, where you predefine data types when writing the code. In those languages, the compiler doesn't just build your code; it guarantees specific rules are followed. That's the main reason for the rise of TypeScript over JavaScript: compile-time guarantees.

Markdown is implicit typing. It lets you write quickly, but without constraints or guarantees. There's no schema. No way to enforce consistency. A heading in one file might be a concept, in another it might be a step, and there's no machine-readable distinction between the two.

To complicate things further, there are multiple flavors of Markdown, each with its own features and markup. Here are just a few:

You think you're writing "Markdown," but what works in one tool may not render in another. Some Markdown processors allow footnotes, Others ignore soft line breaks. And some even require different formatting for code blocks. Inconsistency makes Markdown a shaky foundation for anything beyond the most basic document.

And then there's MDX, which people often use to extend Markdown to support things it doesn't:

Here's a typical MDX snippet:

# Install

<Command>npm install my-library</Command>

That <Command> tag isn't Markdown at all; it's a React component. Instead of using a code block, the author chose to create a special component to standardize how all commands would display in the documentation.

It works beautifully on their site because their publishing system knows what <Command> means. But if they try to syndicate this content to another system, it breaks because that system also needs to implement that component. And even if it was supported elsewhere, there's no guarantee that the component is implemented the same way.

MDX shows that even in Markdown-centric ecosystems, people instinctively add more expressive markup. They know plain Markdown isn't enough. They're reinventing semantic markup, but in a way that's custom, brittle, and not portable.

Why semantic markup matters

Semantic markup describes what content is, not just how it should look. It's the difference between saying "here's a bullet with some text" and "here's a step in a procedure." To a human, those may look the same on a page. To a machine or to a publishing pipeline, they are entirely different.

Web developers already went through all this with HTML. Prior to HTML5, you had <div> as a logical container. But HTML5 introduced <section>, <article>, <aside>, and many other elements that described the content.

Semantic markup matters for two important and related reasons:

  • Transformation and reuse. With semantic markup, you can publish the same content to HTML, PDF, ePub, or even plain Markdown. With Markdown as your source, you can't easily go to another format. You can't turn a bullet into a <step> or a paragraph into a <para> without guessing. You can't add context if it wasn't there to begin with, but you can strip out what you don't need when you transform the document, and you can choose how to present each thing in a consistent way.
  • Machine consumption. LLMs and agents can make better use of content that carries structure. A step marked as a <step> is unambiguous. A bullet point might be a step, or a note, or just a list item. The machine has to guess. This is why XML was a preferred mechanism for web services for a long time, and why JSON Schema exists.

Let's explore four formats that give you more control over structure than plain Markdown.

reStructuredText

reStructuredText is a plain-text markup language from the Python/Docutils ecosystem that supports directives, roles, and structural semantics. It is the foundational format used by Sphinx for generating documentation.

Installation
============

.. code-block:: bash

   npm install my-library

.. note::  
   This library requires Node.JS ≥ 22.

See also :ref:`usage-guide`.

Here you see a code-block directive, an admonition (note), and an explicit cross-reference via :ref:. You'll find support for images, figures, topics, sidebars, pull quotes, epigraphs, and citations as well.

All of those encode semantics, not just presentation.

AsciiDoc

AsciiDoc aims to be human-readable but semantically expressive. It has attributes, conditional content, include mechanisms, and more.

Here's an example of AsciiDoc:

= Installation
:revnumber: 1.2
:platform: linux
:prev_section: introduction
:next_section: create-project

[source,bash]
----
npm install my-library
----

NOTE: This library requires Node.JS ≥ 22.

See <<usage,Usage Guide>> for examples.

AsciiDoc has native support for document front-matter. Attributes like :revnumber: or :platform: let you parameterize content.

<<usage,Usage Guide>> is a cross-reference syntax.

Like reStructuredText, AsciiDoc supports admonitions like NOTE and WARNING so you don't have to build your own custom renderer. It also has support for sidebars, and you can add line highlighting and callouts to your code blocks without additional extensions.

And if you're writing technical documentation, there's explicit support for marking up UI elements and keyboard shortcuts.

Using AsciiDoctor, you can transform AsciiDoc into other formats, including HTML, PDF, ePub, and DocBook, which you'll look at next.

DocBook (XML)

DocBook is an XML-based document model explicitly designed for technical publishing. It expresses hierarchical and semantic structure in tags and attributes, enabling industrial-grade transformations.

Here's an example

<article id="install-library">
  <title>Installation</title>
  <command>npm install my-library</command>
  <note>This library requires Node.JS &gt;= 22</note>
  <xref linkend="usage-chapter">Usage Guide</xref>
</article>

Every tag is meaningful: <command> vs <para>, <note> vs <xref>. You'll find predefined tags for function names, variables, application names, keyboard shortcuts, UI elements, and much more. Being able to mark up the specific product names and terminology you use makes it so much easier to create glossaries and indexes. And Docbook has tags for defining index terms, too.

DocBook's rich ecosystem of XSLT stylesheets supports transforming to HTML, PDF, man pages, and even Markdown. Using DocBook ensures structure and validation at scale, as long as you use the tags it provides.

Then there's DITA.

DITA (Darwin Information Typing Architecture)

DITA is a standard for writing, managing, and publishing content. It's a topic-based XML architecture with built-in reuse, specialization, and modular content design. It's an open standard, and it's widely used in enterprises for multi-channel, structured content that needs standardization and reuse.

Here's an example:

<task id="install">
  <title>Installation</title>
  <steps>
    <step><cmd>npm install my-library</cmd></step>
  </steps>
  <prolog>
    <note>This library requires Node.js &gt;= 22</note>
  </prolog>
</task>

DITA defines types like <task> and <step>, which cleanly map to procedural structure. You can compose topics, reuse via content references (conrefs), and specialize as your domain evolves.

One of the more interesting features DITA provides is the ability to filter content and create multiple versions from a single document.

The DITA Open Toolkit and many enterprise tools handle rendering, transformation, and reuse pipelines.

Ew. XML.

Yes, XML. The syntax is more verbose than Markdown. Tooling is less ubiquitous than Markdown. Migration requires effort, and your team may resist the learning curve. For small docs, Markdown's features are often enough.

But if you're already bolting semantics onto Markdown with MDX or plugins or custom scripts, you're paying that complexity cost anyway, and you don't get the benefits of standardization or portability. You're building a fragile, custom semantic layer instead of adopting one that already works.

So where does that leave you?

If you're writing a quick README or a short-lived doc, Markdown is fine. It's fast, approachable, and does the job. If you're building a developer documentation site that needs some structure, reStructuredText or AsciiDoc are better choices. They balance expressiveness with usability. And if you're managing a large doc set that needs syndication, reuse, and multi-channel publishing, DocBook and DITA give you the semantics and tooling to make that process more manageable.

The key is to start with the richest format you can manage and export downward. Markdown makes a great output for developers. It's approachable and familiar. But be careful not to lock yourself into it as your source of truth, because you can't add context back as easily as you can strip it out.

  • I have a new book out. Check out Write Better with Vale. This book walks you through implementing Vale, the prose linter, on your next writing project to create consistent, quality content.
  • Tidewave.ai is a full-stack coding agent from the creators of the Elixir programming language. It supports Ruby on Rails, Phoenix, and React applications and has a free tier. You'll need an API key for OpenAI, Anthropic, or GitHub Copilot to use it.
  • Google's Chrome for Developers blog has a post on creating accessible carousels. It's worth the read if you have to implement one of these on your site.

Before the next issue, here are a couple of things you should try to get some hands-on experience with a different format.

As always, thanks for reading. Share this issue with someone who you think would find this helpful.

I'd love to talk with you about this issue on BlueSky, Mastodon, Twitter, or LinkedIn. Let's connect!

Please support this newsletter and my work by encouraging others to subscribe and by buying a friend a copy of Write Better with Vale, tmux 3, Exercises for Programmers, Small, Sharp Software Tools, or any of my other books.

联系我们 contact @ memedata.com