离线数学:使用 MathJax 将 LaTeX 转换为 SVG
Offline Math: Converting LaTeX to SVG with MathJax

原始链接: https://sigwait.org/~alex/blog/2025/10/07/3t8acq.html

## 将 MathJax 转换为独立 SVG 与 `mathjax-embed` Pandoc 的 `--mathjax` 选项用于 LaTeX 数学渲染,依赖于 CDN,这会导致离线访问和兼容性问题(例如,许多 ePub 阅读器)。虽然 MathML 提供了一种现代、浏览器兼容的解决方案,而 SVG 提供了可移植性,但将现有的 MathJax HTML 转换为完全独立格式需要额外的步骤。 `mathjax-embed` 脚本通过解析使用 `--mathjax` 生成的 HTML,并用 SVG 图像替换 MathJax 跨度来解决这个问题。它避免了 JavaScript 依赖和外部资源。 该脚本利用 jsdom 加载 HTML,在环境中配置 MathJax,然后提取渲染的 SVG。最近的改进使 jsdom 的性能足以完成这项任务,克服了以前的速度限制。该过程涉及自定义资源加载器和对 jsdom URL 解析的仔细处理。 生成的 HTML 是自包含的,可以在离线状态下工作,并消除了对外部服务器的依赖。它被设计用于管道:`echo '...' | pandoc -s -f markdown --mathjax | mathjax-embed > output.html`。完整的源代码可在 Github 上获取。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录离线数学:使用 MathJax 将 LaTeX 转换为 SVG (sigwait.org)9 分,作者 henry_flower 31 分钟前 | 隐藏 | 过去 | 收藏 | 讨论 考虑申请 YC 的 2026 年冬季批次!申请截止日期为 11 月 10 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式搜索:
相关文章

原文

Latest update:

Pandoc can prepare LaTeX math for MathJax via its eponymous --mathjax option. It wraps formulas in <span class="math"> elements and injects a <script> tag that points to cdn.jsdelivr.net, which means rendering won't work offline or in case of the 3rd-party server failure. You can mitigate this by providing your own copy of the MathJax library, but the mechanism still fails when the target device doesn't support JavaScript (e.g., many epub readers).

At the same time, practically all browsers support MathML. Use it (pandoc's --mathml option), if you care only about the information superhighway: your formulas will look good on every modern device and scale delightfully. Otherwise, SVGs are the only truly portable option.

Now, how can we transform the html produced by

$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
  pandoc -s -f markdown --mathjax

into a fully standalone document where the formula gets converted into SVG nodes?

  1. Use an html parser like Nokogiri, and replace each <span class="math"> node with an image. There are multiple ways to convert a TeX-looking string to an SVG: using MathJax itself (which provides a corresponding CLI example), or by doing it in a 'classical' fashion with pdflatex. (You can read more about this method in A practical guide to EPUB, chapters 3.4 and 4.6.)
  1. Alternatively, load the page into a headless browser, inject MathJax scripts, and serialise the modified DOM back to html.

I tried the 2nd approach in 2016 with the now-defunct phantomjs. It worked, but debugging was far from enjoyable due to the strangest bugs in phantomjs. I can still run the old code, but it depends on an ancient version of the MathJax library that, for obvious reasons, isn't easily upgradable within the phantomjs pre-es6 environment.

Nowadays, Puppeteer would certainly do, but for this kind of task I prefer something more lightweight.

There's also jsdom. Back in 2016, I tried it as well, but it was much slower than running phantomjs. Recently, I gave jsdom another try and was pleasantly surprised. I'm not sure what exactly tipped the scales: computers, v8, or jsdom itself, but it no longer feels slow in combination with MathJax.

$ wc -l *js *conf.json
  24 loader.js
 105 mathjax-embed.js
  12 mathjax.conf.json
 141 total

Roughly 50% of the code is nodejs infrastructure junk (including CL parsing), the rest is a MathJax config and jsdom interactions:

let dom = new JSDOM(html, {
  url: `file://${base}/`,
  runScripts: /* very */ 'dangerously',
  resources: new MyResourceLoader(), // block ext. absolute urls
})

dom.window.my_exit = function() {
  cleanup(dom.window.document) // remove mathjax <script> tags
  console.log(dom.serialize())
}

dom.window.my_mathjax_conf = mathjax_conf // user-provided

let script = new Script(read(`${import.meta.dirname}/loader.js`))
let vmContext = dom.getInternalVMContext()
script.runInContext(vmContext)

The most annoying step here is setting url property that jsdom uses to resolve paths to relative resources. my_exit() function is called by MathJax when its job is supposedly finished. loader.js script is executed in the context of the loaded html:

window.MathJax = {
  output: { fontPath: '@mathjax/%%FONT%%-font' },
  startup: {
    ready() {
      MathJax.startup.defaultReady()
      MathJax.startup.promise.then(window.my_exit)
    }
  }
}

Object.assign(window.MathJax, window.my_mathjax_conf)

function main() {
  var script = document.createElement('script')
  script.src = 'mathjax/startup.js'
  document.head.appendChild(script)
}

document.addEventListener('DOMContentLoaded', main)

The full source is on Github.

Intended use is as follows:

$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
  pandoc -s -f markdown --mathjax |
  mathjax-embed > 1.html

The resulting html doesn't use JavaScript and doesn't fetch any external MathJax resources. mathjax-embed script itself always works offline.


Tags: ойті
Authors: ag
联系我们 contact @ memedata.com