没有 GPU 的 WebGL

没有 GPU 的 WebGL
WebGL Without a GPU

原始链接: https://microlink.io/blog/webgl-without-a-gpu

Microlink 通过将其软件光栅化器从 Chrome 默认的“SwiftShader”切换为“Mesa llvmpipe”，将其无头 WebGL 的渲染速度提升了四倍。由于 Microlink 的基础设施运行在无 GPU 的 Linux 节点上，它依赖基于 CPU 的模拟来渲染 WebGL。SwiftShader 设计初衷是为了保守兼容，效率较低，捕获 3D 页面需要约 24 秒。通过配置 Chrome 使用带有 Mesa llvmpipe 的 `ANGLE`（`--use-angle=gl`），该团队利用 JIT 编译和多线程技术，将渲染时间缩短至约 6 秒。实现这一过程需要自定义的多阶段 Docker 构建以包含更新版本的 Mesa，并配合虚拟显示器（`xvfb`）来支持 GL 表面。为防止“静默失败”（即 Chrome 默认回退到看起来渲染成功但实际为平面 2D 的模式），团队添加了 CI 断言以验证当前活动的渲染器确实是 `llvmpipe`。虽然这种基于软件的方法无法替代物理 GPU 的性能，但它有效地消除了超时问题，并成功稳定了整个浏览器集群的 WebGL 截图功能。

Hacker News 新闻 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录 WebGL 无需 GPU (microlink.io) 6 分，由 Kikobeats 发布于 30 分钟前 | 隐藏 | 往期 | 收藏 | 2 条评论帮助 Achterlangs 6 分钟前 | 下一条 [-] 如果这是硬性要求，为什么不用一台支持 WebGL 的服务器呢？我觉得一旦这成为问题，你所处的规模就已经有理由这样做了。回复 actionfromafar 8 分钟前 | 上一条 [-] Chrome 并不渲染 WebGL。是 ANGLE 在渲染。为什么差距是 4 倍。改动：只有一行代码。陷阱：你需要一个显示器。为什么 Mesa 要从源码构建。证明过程。那份报告同时也是持续集成（CI）的门槛！数据结果。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

WebGL is everywhere now: 3D maps, seat charts, product configurators, shader-art landing pages. It was also the slowest thing you could ask Microlink to screenshot. One Chrome flag fixed that.

A WebGL page (three.js) captured as an animated screenshot, rendered through Mesa llvmpipe on a GPU-less node

TL;DR

Our servers have no GPU. WebGL still has to render somewhere.
Chrome's default software path (SwiftShader) took ~24s per 3D page.
Pointing ANGLE at Mesa llvmpipe (--use-angle=gl) dropped it to ~6s.
The one-line flag is the easy part. The display, the from-source Mesa, and proving it stays on the fast path are the rest of the story.

Our browser fleet runs on commodity Linux nodes with no graphics card and no /dev/dri. Cheaper, simpler, fewer drivers to babysit. But WebGL is a GPU API, so without one, something has to emulate it on the CPU. Which something was the difference between a 24-second screenshot and a 6-second one.

Chrome hands WebGL to , which translates it to whatever backend the platform has: Direct3D, Metal, native OpenGL or Vulkan, or a software renderer when there's no GPU.

With no GPU, that software renderer is the whole ballgame. Chrome ships two: SwiftShader, its bundled default, or the system OpenGL stack, which on our Linux nodes is . Same pixels on the CPU, wildly different speed.

SwiftShader emulates the whole pipeline conservatively, optimizing for "draws correctly anywhere." A heavy 3D scene takes ~24s; the 2D pages next to it, 2-3s.

llvmpipe is built differently:

It JITs to native code. LLVM compiles the live shader and GL state into real x86-64. No interpreter loop.
It is tiled and multi-threaded. It actually uses every core.

Several times faster, same output.

- '--use-angle=swiftshader',
+ '--use-angle=gl',

What you must not add back:

--disable-gpu silently forces SwiftShader on again. It is the most-copied flag in every headless tutorial.
--in-process-gpu kills the GL surface ANGLE needs.

--use-angle=gl has to bind a GL surface, which needs an X display, even headless. No display, and WebGL silently degrades to a flat 2D fallback: the screenshot still succeeds, the request still returns 200, and the output is wrong but plausible.

So every container boots a virtual display () before Chrome starts, with LIBGL_ALWAYS_SOFTWARE=1 pinning Mesa to llvmpipe.

Ubuntu jammy's Mesa is too old for this, and the PPAs that used to backport it are gone. So the base image compiles its own:

meson setup build \
  -Dbuildtype=release -Dgallium-drivers=llvmpipe -Dvulkan-drivers= \
  -Dllvm=enabled -Dshared-llvm=enabled

llvmpipe only, no Vulkan, shared LLVM (that's where the JIT speed lives). The build toolchain is huge (LLVM, clang, Rust, ~160 -dev packages), so the Dockerfile is multi-stage: compile Mesa, then COPY only the artifacts into a clean image. 2.65GB instead of 4.5GB.

You can't tell which renderer a node uses by looking at it. apt list lies (we side-load Mesa over the package), and the real answer lives inside the page. So asks the live GL context directly:

const browserless = require('browserless')
const report = await browserless.report()
console.log(report)

browserless.report() from a production node. Expand gpu and cpu for the full picture.

The gpu block is the whole story:

type is software / llvmpipe here. swiftshader would mean we fell back; hardware would mean a GPU appeared.
mesa is read from the loaded libgallium-<ver>.so, not dpkg, which reports the stale package version under our side-load.
simdWidth: 256 means llvmpipe is using AVX2, which is most of why it's fast.

report({ benchmark: true }) adds a deterministic shader benchmark (~300ms on llvmpipe) for comparing nodes.

The flat-2D fallback is dangerous because it looks like success. So CI asserts on report(): gpu.type must be software, gpu.device must be llvmpipe. Any drift fails the build instead of shipping flat 3D. The same call runs against production pods.

The diff is one line. Proving it was the right line took weeks of measurement, mostly fighting two traps:

Dev machines lie. A real GPU renders pages that come back black on prod. Every number had to come from prod-shaped hardware.
Single runs lie. Cold JIT, first-paint races, shared cores. The fastest-looking result was sometimes the wrong one: the flat fallback shipped ~1s quicker.

That's why the deterministic benchmark exists: fixed shader, forced frames, one stable number. With it, the comparison stopped being anecdotal. SwiftShader landed at ~24-31s; llvmpipe at ~6s warm and correct.

Same 3D chart, same GPU-less hardware, measured on production:

	SwiftShader (before)	Mesa llvmpipe (after)
Render time (isolated)	~24s	~6s (~4×)
Render time (under load)	~24s	7–14s (~2×)
Failed requests	timed out → errors	none
Active renderer	SwiftShader	llvmpipe (asserted in CI)

Isolated, the chart finishes in ~6s. Under real traffic, where captures share cores, expect ~2×. Either way, the requests that used to time out now finish.

The following examples show how to use the Microlink API with CLI, cURL, JavaScript, Python, Ruby, PHP & Golang, targeting 'https://threejs.org/examples/webgl_animation_skinning_blending' URL with 'screenshot' API parameter:

CLI Microlink API example

microlink https://threejs.org/examples/webgl_animation_skinning_blending&screenshot.animated

cURL Microlink API example

curl -G "https://api.microlink.io" \
  -d "url=https://threejs.org/examples/webgl_animation_skinning_blending" \
  -d "screenshot.animated=true"

JavaScript Microlink API example

import mql from '@microlink/mql'

const { data } = await mql('https://threejs.org/examples/webgl_animation_skinning_blending', {
  screenshot: {
    animated: true
  }
})

Python Microlink API example

import requests

url = "https://api.microlink.io/"

querystring = {
    "url": "https://threejs.org/examples/webgl_animation_skinning_blending",
    "screenshot.animated": "true"
}

response = requests.get(url, params=querystring)

print(response.json())

Ruby Microlink API example

require 'uri'
require 'net/http'

base_url = "https://api.microlink.io/"

params = {
  url: "https://threejs.org/examples/webgl_animation_skinning_blending",
  screenshot.animated: "true"
}

uri = URI(base_url)
uri.query = URI.encode_www_form(params)

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Get.new(uri)
response = http.request(request)

puts response.body

PHP Microlink API example

<?php

$baseUrl = "https://api.microlink.io/";

$params = [
    "url" => "https://threejs.org/examples/webgl_animation_skinning_blending",
    "screenshot.animated" => "true"
];

$query = http_build_query($params);
$url = $baseUrl . '?' . $query;

$curl = curl_init();

curl_setopt_array($curl, [
    CURLOPT_URL => $url,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => "",
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 30,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => "GET"
]);

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if ($err) {
    echo "cURL Error #: " . $err;
} else {
    echo $response;
}

Golang Microlink API example

package main

import (
    "fmt"
    "net/http"
    "net/url"
    "io"
)

func main() {
    baseURL := "https://api.microlink.io"

    u, err := url.Parse(baseURL)
    if err != nil {
        panic(err)
    }
    q := u.Query()
    q.Set("url", "https://threejs.org/examples/webgl_animation_skinning_blending")
    q.Set("screenshot.animated", "true")
    u.RawQuery = q.Encode()

    req, err := http.NewRequest("GET", u.String(), nil)
    if err != nil {
        panic(err)
    }

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        panic(err)
    }

    fmt.Println(string(body))
}

import mql from '@microlink/mql'

const { data } = await mql('https://threejs.org/examples/webgl_animation_skinning_blending', {
  screenshot: {
    animated: true
  }
})

Try it: a WebGL page captured live through ANGLE → Mesa llvmpipe. See the animated screenshot docs for the parameters.

Software GL closes most of the gap, not all. Heavy fragment-shader heroes can still come back black, because the canvas hasn't painted by capture time. That's a first-paint race, not a renderer problem, and no flag fixes it. The real fixes are gating capture on first paint, or real GPUs. We're doing the first.

For everything else, moving from SwiftShader to llvmpipe turned our slowest, flakiest requests into ordinary ones.