展示HN:我成功地未能一蹴而就地破解像h.264这样的视频编解码器。
Show HN: I successfully failed at one-shot-ing a video codec like h.264

原始链接: https://github.com/DheerG/libsinter

该项目详细介绍了一种实验性视频编解码器“sinter”,完全使用Claude Code代理团队构建——模拟由五名编解码器专家组成的团队(包括在VP9、AV1、H.264和HEVC开发中举足轻重的 figures)。目标并非生产就绪的压缩,而是探索在没有先验经验的复杂领域中的一次性代理工作流程。 Sinter利用重叠变换、感知向量量化(PVQ)和rANS熵编码,旨在提供一种无专利的替代方案,以替代诸如H.264之类的成熟编解码器。虽然实现了具有竞争力的感知质量,但目前在压缩效率方面明显落后——在可比亮度质量下,文件大小是H.264的18.6倍。 模拟的专家团队确定,缺乏诸如亚像素运动补偿和B帧之类的功能是造成这一差距的关键因素。他们估计,在保证专利安全的情况下,可以实现对H.264尺寸的4-6倍的实际改进。通过12次迭代循环构建的约5,000行C代码库,展示了人工智能驱动的开发在专业领域中的潜力。

对不起。
相关文章

原文

An experimental, vibecoded video codec built from scratch using Claude Code agent teams. Explores lapped transforms, perceptual vector quantization (PVQ), and rANS entropy coding as a patent-free alternative to the H.264/H.265 lineage.

This was a learning experiment, not a production codec. The goal was to test one-shot agent team workflows on a domain I had zero prior experience in — and to see how far a simulated expert team could push a novel architecture. Write-up: One-Shot Wonder | Claude Agent Teams

At comparable luma quality (~49 dB): 18.6x larger than H.264. The architecture produces competitive perceptual quality but cannot match H.264's compression efficiency without adopting the same tools (B-frames, sub-pel MC, CABAC-level entropy coding).

QP Sinter PSNR Sinter Size H.264 PSNR H.264 Size
4 49.10 dB 136 KB 74.59 dB 33 KB
20 33.70 dB 37 KB 55.12 dB 11 KB
28 28.41 dB 13 KB 48.37 dB 7 KB

(256x256 testsrc, 30 frames. Full BD-rate data in SCOREBOARD.md.)

~5,000 lines of C across 12 improvement loops:

  • Lapped transforms (TDLT, 20% Malvar lifting) — eliminates blocking artifacts structurally
  • Hybrid PVQ/scalar quantization — preserves texture where H.264 smooths it away
  • Dual-interleaved rANS with 10 PARA-adaptive CDFs and cross-frame carry
  • Inter prediction (P-frames, integer-pel full search, median MV prediction, skip mode)
  • Chroma inter (luma MV at half-resolution for YUV 4:2:0)
  • CLI tools with Y4M I/O (sinterenc / sinterdec)
  • 32 passing tests, entirely patent-free
make && make test

# Build CLI
gcc -O2 -I libsinter tools/sinterenc.c -Lbuild -lsinter -lm -o build/sinterenc
gcc -O2 -I libsinter tools/sinterdec.c -Lbuild -lsinter -lm -o build/sinterdec

# Encode/decode
ffmpeg -f lavfi -i "testsrc=duration=1:size=64x64:rate=10" -pix_fmt yuv420p -f yuv4mpegpipe input.y4m -y
./build/sinterenc -i input.y4m -o encoded.sntr -q 12
./build/sinterdec -i encoded.sntr -o decoded.y4m

The expert team's consensus: the 18.6x gap is the product of missing standard tools, each multiplicative:

Missing Feature Cost Notes
No sub-pel MC 1.5-2x Integer-pel can't track sub-pixel motion
No B-frames 1.5-2x Patent risk on bidirectional prediction
PVQ overhead 1.5-2x Gain-shape is more verbose than run-level
Fewer entropy contexts 1.2-1.5x 10 CDFs vs CABAC's hundreds

Realistic patent-safe ceiling: 4-6x H.264 with half-pel + better contexts. Matching H.264 would require rebuilding most of its toolset, defeating the purpose.

Built with Claude Code. The expert team was assembled by asking Claude to name five experts for building an open-source codec competing with H.264/H.265:

  • Jim Bankoski — Google architect behind VP8 and VP9; open-codec veteran
  • Timothy Terriberry — Mozilla/Xiph engineer; drove Daala and AV1 research
  • Monty Montgomery — Xiph.org founder; built Ogg, Vorbis, and Theora ecosystems
  • Gary Sullivan — Microsoft; co-chaired the H.264 and HEVC standardization efforts
  • Jens-Rainer Ohm — HEVC co-chair; deep expertise in perceptual coding theory

Plus a principal engineer (facilitator) and patent reviewer. All personas simulated by Claude.

  • C11 compiler (GCC or Clang)
  • make
  • ffmpeg (for Y4M generation)
  • No external dependencies
联系我们 contact @ memedata.com