个人博客的视觉回归测试

个人博客的视觉回归测试
Visual regression tests for personal blogs

原始链接: https://marending.dev/notes/visual-testing/

## 静态网站的视觉回归测试作者使用 Astro 和 MDX 构建静态网站，并在 CSS 更改后手动检查旧笔记，以确保没有意外的副作用。为了提高重构的信心，他们使用 Playwright 实现了视觉回归测试。 Playwright 自动化浏览器操作，拍摄指定页面的截图，并将其与“黄金”快照进行比较。超出设定的阈值的偏差会标记测试失败，从而提示修复或更新快照。这提供了与代码提交一起的历史站点外观记录。实现过程包括初始化 Playwright 项目，并创建一个测试套件，该套件遍历笔记页面列表，完全滚动每个页面以加载延迟加载的图像，然后捕获全页截图进行比较。作者优先考虑简单的流程，在可能产生重大影响的更改后手动运行测试，而不是在 CI/CD 中自动化。这种方法仅需几个小时的设置，但通过防止视觉回归并增强对站点修改的信心，增加了显著的价值。

## 博客视觉回归测试：摘要最近的 Hacker News 讨论强调了在个人博客和网站中使用视觉回归测试 (VRT) 的优势和挑战。VRT 包括截取网站的屏幕截图，并将其与基准图像进行比较，以检测意外的视觉变化——在更新或重构期间至关重要。 Playwright 等工具很受欢迎，提供内置的差异比较功能。然而，像 Vizzly 的创建者（基于 Honeydiff 构建）这样的用户指出，基本的逐像素比较存在局限性，并提倡更复杂的分析，例如空间聚类和感知评分。虽然 VRT 可以节省时间来发现细微的错误，但它也并非没有障碍。挑战包括测试的不稳定性、资源密集型测试（需要强大的服务器进行并行浏览器测试）、管理屏幕截图/视频的大型数据存储，以及缺乏易于按提交记录查看更改的功能。一些用户通过降低测试频率（例如，夜间）和隔离测试环境来成功缓解资源问题。尽管存在复杂性，但共识是 VRT 很有价值，但设置需要付出努力才能获得可靠的结果。

原文

← back

06 Jan 2026

Gaining confidence in refactorings

This website is built using Astro to generate static pages. I author the notes themselves with mdx, a nice extension to markdown to include inline html and other components. The static html after building this site is then styled using some rather convoluted CSS. All this is to say that if I change, for example, a margin of a list item only if it precedes an image element, this may have unintended consequences on an older note I don’t look at often.

Whenever I do such changes, I find myself sampling older notes to see if something is broken. Lately, I’ve had the idea to use Playwright to do visual regression testing. For the uninitiated, this type of testing simply takes automated screenshots of pages (using a headless browser typically) and compares them against an earlier, considered golden, snapshot. Should the image deviate by more than some configurable threshold, the test is considered a failure. Then you either fix your application or in case of a legitimate change, you simply update the golden snapshot for that particular test.

These tests would come with the obvious upside of increasing confidence that changes don’t have unintended side-effects. Especially for a static website where the visual appearance is really all there is to it. But further, because I check the screenshots into the git repo, I get an automatic history of what the site looks like at the time of the commit.

Technical implementation

Playwright makes this quite easy out of the box. After initializing a new test project using npm init playwright@latest in the same repo as the website itself, I add this single test file:

import { test, expect } from "@playwright/test";

const notes = [
  "/",
  "/projects/",
  "/about/",
  "/notes/launchd/",
  "/notes/jour/",
  "/notes/reflective/",
  "/notes/monitoring/",
  "/notes/otel/",
  "/notes/llm/",
  "/notes/clickhouse/",
  "/notes/server-setup/",
  "/notes/jpeg-raw/",
  "/notes/go-rest-quest/",
  "/notes/responsive-plots/",
  "/notes/co2-loft/",
  "/notes/sqlite-vs-duckdb/",
  "/notes/unstructured-data/",
  "/notes/rest-quest/",
  "/notes/fieldnotes/",
  "/notes/rust-spa/",
  "/notes/16-hour-projects/",
  "/notes/wasm-benchmark/",
  "/notes/vps-benchmarks/",
  "/notes/sqlite-benchmarks/",
  "/notes/league-rating/",
  "/notes/league-data/",
  "/notes/co2-bedroom/",
  "/notes/esp-protocol/",
  "/notes/esp-power/",
  "/notes/performance/",
  "/notes/website/",
  "/feedback/",
];

test.describe("Visual regression", () => {
  const baseUrl = "https://marending.dev";

  for (const note of notes) {
    test(`capture page: ${note}`, async ({ page }) => {
      const url = `${baseUrl}${note}`;

      await page.goto(url);
      await page.waitForTimeout(200);

      const pageHeight = await page.evaluate(() => document.body.scrollHeight);

      for (let scrolled = 0; scrolled < pageHeight; scrolled += 200) {
        await page.mouse.wheel(0, 200);
        await page.waitForTimeout(200);
      }

      await page.waitForLoadState("networkidle");

      const screenshotName =
        note
          .replace(/^\//, "")
          .replace(/\/$/, "")
          .replace(/[^a-z0-9]/gi, "-")
          .toLowerCase() || "index";

      await expect(page).toHaveScreenshot(`${screenshotName}.png`, {
        fullPage: true,
      });
    });
  }
});

There are a couple of things to note here. The magic happens on the await expect(page).toHaveScreenshot line. This makes Playwright take a screenshot and compare it against a stored screenshot. If no screenshot with this name exists, it will fail the test and you have to first generate a screenshot by running your suite with --update-snapshots.

Second, there is some complication involved with taking full page screenshots. My website lazy-loads images, which means some images aren’t loaded when the page is sufficiently long. I wouldn’t care so much about that if it wasn’t flaky whether some images are loaded or not. I noticed that sometimes particular images were loaded and sometimes not, which defeats the purpose when trying to look for pixel differences between snapshots. For this purpose, you’ll notice the whole scrolling logic in the test: I scroll down the whole page 200 pixels at a time to ensure all images are loaded in.

Lastly, the list of pages I want to test are statically listed in the notes variable. At first, I actually generated this dynamically by programmatically visiting the index page and then extracting all linked targets. In the current design of the site, this yields exactly all subpages. Another way would be to expose an “endpoint” in the site that produces all the pages in the notes collection. Both approaches have the benefit of not requiring me to update the list manually when I publish a new note, but come with the downside that I need to execute all tests in a single Playwright test.

You see, this test-inside-for-loop you can see above only works as expected when the array to iterate over is statically known. In the dynamic approaches I can’t do that. And then you have to deal with the test failing once the first screenshot doesn’t match, instead of getting a nice summary in the case where each page is its own clean test.

Workflow

So how do I use this? It would be easy to over-engineer it and run this in CI periodically or build it into my deployment script. Instead, I decided to keep it simple and stupid. I have this setup with the images checked into the same repo as the website itself and I run the tests whenever I feel like I’ve made changes that could affect some other part of the site. There is no point in burning energy by running them on every commit or constantly failing my deployment just to confirm that changing a typo on a page does in fact cause visual changes.

With such simplicity in mind, it’s easy to add real value to my workflow with maybe 2 hours of effort. I need to do more things like it.