Show HN:让 PDF 看上去像扫描件(支持 CLI 或通过 WASM 在浏览器中使用)
Show HN: Make PDFs look scanned (CLI or in the browser via WASM)

原始链接: https://github.com/overflowy/make-look-scanned

**make-look-scanned** 是一款将整洁的 PDF 文档转换为逼真数字扫描件的工具。通过对页面进行栅格化处理并应用一系列特效流程,它能够模拟物理打印件的瑕疵,包括歪斜、灰度、纸张底色、噪点、模糊、边缘阴影以及 JPEG 压缩痕迹。 该项目提供两种主要运行方式: * **命令行界面 (CLI):** 基于 Go 语言编写的二进制程序,使用 MuPDF 进行高质量栅格化。默认采用确定性处理(基于内容的哈希),但也支持通过 `config.toml` 配置文件或明确的 CLI 参数进行自定义随机化与设置。 * **Web 端:** 基于 WASM 的浏览器版本,使用 PDF.js 进行渲染。它以单个独立的 HTML 文件形式存在,支持离线使用,无需外部依赖或服务器端处理。 核心功能包括针对每种特效的可调参数,以及定义可复用预设的能力。由于对 MuPDF 的依赖,CLI 版本采用 AGPL-3.0 许可证;而浏览器版本则使用采用 Apache-2.0 许可证的 PDF.js。两个版本都会移除原有的可选中文字,确保输出内容真实还原扫描文档的效果。

最近一个“Show HN”项目推出了一款工具,能让电子版 PDF 看上去像是经过打印和扫描的物理文档。开发者旨在提供一种比现有在线工具更真实的替代方案,因为那些在线工具往往要求用户将敏感文档上传至安全性存疑的服务器。 该项目在 Hacker News 上引发了热烈讨论,许多用户注意到了其实用价值。一些评论者指出,他们经常遇到官僚主义要求(如企业或法律政策),强制规定文档必须打印、手写签名后再扫描成数字格式。通过使用该工具(或手动运行 ImageMagick 脚本),用户可以绕过这种低效的“形式主义”流程。尽管部分用户讨论了使用此类工具提交文档的道德问题,但也有人对该项目在隐私和视觉效果控制方面的关注表示赞赏,并将这种“扫描”质感的吸引力与人们对黑胶唱片等模拟格式的怀旧情怀进行了对比。
相关文章

原文

A CLI that takes a PDF and degrades it to look like a physical scan of a printout — skew, grayscale, warm paper tone, scanner grain, defocus, edge shadow, and JPEG compression artifacts. Also runs client-side in the browser via WASM.

example

Each page is rasterized to an image, run through the effect pipeline, and reassembled into a new image-only PDF (the original selectable text is gone — faithful to a basic scanner).

Requires Go and a C toolchain (go-fitz links MuPDF via cgo, so the binary is self-contained — nothing to install at runtime).

go build -o make-look-scanned .
make-look-scanned [flags] input.pdf

Flags may appear before or after the input filename.

make-look-scanned in.pdf                 # -> in.scanned.pdf
make-look-scanned in.pdf -o out.pdf
make-look-scanned in.pdf --noise 0.4 --skew 2.5 --jpeg-quality 30
Flag Default Meaning
-o <input>.scanned.pdf output path
--preset named preset from config.toml
--seed content hash random seed (override for a new look)
--force false overwrite an existing output file
--dpi 150 render resolution
--skew 0.6 max rotation degrees (0 disables)
--grayscale true desaturate (--grayscale=false keeps color)
--paper-tone 0.6 warm paper tint strength 0..1
--noise 0.08 scanner grain 0..1
--blur 0.4 defocus gaussian sigma
--edge-shadow 0.15 border vignette 0..1
--jpeg-quality 70 JPEG quality 1..100

Each numeric knob disables its effect at 0.

Output is deterministic by default: the seed is derived from the input PDF's content, so the same file always produces the same scan. Pass --seed N for a different (but reproducible) look. Same input + seed yields a byte-identical PDF.

Define reusable bundles in $XDG_CONFIG_HOME/make-look-scanned/config.toml (falls back to ~/.make-look-scanned/config.toml when XDG_CONFIG_HOME is unset). Keys mirror the flag names with underscores:

[presets.medium]
skew = 1.5
paper_tone = 0.6
noise = 0.2
blur = 0.6
edge_shadow = 0.3
jpeg_quality = 45
make-look-scanned --preset medium in.pdf

Precedence: built-in defaults → selected preset → explicit CLI flags (flags always win).

The effect pipeline also runs in the browser. go-fitz/MuPDF can't compile to wasm, so the browser uses PDF.js to rasterize pages and hands the pixels to the same Go effects + assembly code compiled to wasm.

Dev (needs network for the PDF.js CDN):

./web/build.sh                       # builds web/main.wasm + wasm_exec.js
(cd web && python3 -m http.server 8080)   # then open http://localhost:8080

Single self-contained file (works offline, nothing to serve):

task build:web                       # writes dist/make-look-scanned.html (~8 MB)

dist/make-look-scanned.html inlines the wasm, Go's runtime glue, and PDF.js (library + worker) as base64 — open it directly in a browser. Output is visually equivalent to the CLI but not byte-identical, since PDF.js and MuPDF rasterize differently.

AGPL-3.0. The CLI statically links MuPDF (via go-fitz), which is AGPL-3.0, so the combined binary is AGPL-3.0 — distributing it requires offering the corresponding source. The browser build does not include MuPDF (it uses PDF.js, Apache-2.0).

联系我们 contact @ memedata.com