Raress96/Dolby-Atmos-encoder：Dolby Atmos 编码器概念验证 (PoC)

Raress96/Dolby-Atmos-encoder：Dolby Atmos 编码器概念验证 (PoC)
Raress96/Dolby-Atmos-encoder: PoC Dolby Atmos encoder

原始链接: https://github.com/raress96/dolby-atmos-encoder

本仓库记录了将 Dolby TrueHD Atmos 音轨转换为带有联合对象编码（JOC）的 E-AC-3（Dolby Digital Plus）的研究成果。其目的是让无法直接透传 TrueHD 的硬件设备（例如通过 eARC 连接的电视）能够渲染基于对象的 Atmos 音效。该项目提供了一套基于 Rust 的工具链，用于解析 Dolby Atmos Master (DAMF) 文件、将对象音频渲染为 5.1 声道底床、编码元数据（OAMD/JOC）并将其注入 E-AC-3 核心。尽管输出内容在技术上是正确的，且可被软件解码器（如 ffmpeg、Cavern）接受，但**该输出无法在经过认证的硬件上触发 Atmos 播放。** 阻碍成功的两道专有“壁垒”如下： 1. **核心质量：** 开源的 E-AC-3 编码器缺乏杜比认证硬件合规性所需的声道耦合（channel-coupling）技术。 2. **加密验证：** EMDF 容器要求使用密钥 HMAC 生成“保护位”签名。该算法属于专有且未公开的技术，在没有杜比私钥的情况下无法绕过。因此，开源工具目前无法制作出符合硬件标准的 Atmos 音频。本项目仅作为一份客观、已记录的研究资料及概念验证，旨在说明对最终用户而言，唯一可行的方案是绕过 eARC，将媒体源直接连接到支持 Atmos 的功放（AVR）。

Hacker News 新内容 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Raress96/Dolby-Atmos-encoder: Dolby Atmos 编码器概念验证 (github.com/raress96) 7 分，xbmcuser 于 1 小时前提交 | 隐藏 | 过往 | 收藏 | 讨论帮助指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请加入 YC | 联系搜索：

Convert a Dolby Atmos Master (DAMF) — as decoded from a Dolby TrueHD + Atmos stream by truehdd — into E-AC-3 (Dolby Digital Plus) with Joint Object Coding (JOC), i.e. "DD+ Atmos". The aim was to let consumer gear that can't bitstream TrueHD Atmos (e.g. an LG TV → Denon AVR over eARC) render real object-based Atmos with height.

Status: research-complete, hardware-blocked

Every stage below is provably correct by every software oracle available (ffmpeg 7, VoidXH/Cavern): the output is detected as E-AC-3 (Dolby Digital Plus + Dolby Atmos), decodes with the right object count and valid 3D object positions (including height), and is CRC-clean. But it does not engage Atmos on Dolby-certified hardware — playback falls back to Dolby Surround.

The cause is two independent, proprietary walls (see Why it can't fully work). This repository is the honest, documented artifact of that investigation, with a clean seam for the one missing cryptographic piece.

This is for personal / interoperability research use only. See Licensing & provenance.

A near-complete reimplementation of the relevant parts of a DD+ Atmos encoder, in Rust:

DAMF reader — parses the Dolby Atmos Master (.atmos / .audio / .metadata).
5.1 downmix renderer — VBAP-pans the Atmos objects to a 5.1 bed (L R C LFE Ls Rs).
OAMD encoder (ETSI TS 103 420) — per-frame Object Audio Metadata: object positions (with elevation), bed/LFE, program assignment. Round-trips through our decoder and Cavern.
JOC encoder — Joint Object Coding matrices over the 5-channel core × parameter bands, with Dolby's quant/Huffman config. Bit-exact round-trip.
EMDF container (ETSI TS 102 366 Annex H) — wraps OAMD (id 11) + JOC (id 14) with the emdf_protection field, carried in the E-AC-3 audio-block skip field exactly where real Dolby streams put it (recomputing frmsiz + crc2).
addbsi signaling — the flag_ec3_extension_type_a + complexity_index Atmos-detection flag.

The E-AC-3 core itself is encoded externally (e.g. by ffmpeg); this tool injects/synthesizes the Atmos metadata layer. See Command variants below.

(Stand-alone crate — it does not depend on the truehdd library; it consumes the DAMF files truehdd produces. The only out-of-the-ordinary deps are hmac/sha2, used by the EMDF signing seam.)

The frame pipeline streams: eac3::transform_frames_io reads the E-AC-3 core in a few-MiB rolling buffer, transforms each syncframe, and writes it straight out, so memory stays bounded regardless of file length (E-AC-3 frames are ≤4 KB, so a 4 MiB chunk batches ~1000 frames per read). eac3inject and oamd are fully streamed (input + output); atmos streams its output and still loads the core once for per-frame JOC context. Fully streaming atmos's input and the (large) master audio essence is possible future work. Note: this is plain synchronous streaming — async would add a runtime without helping a single-file, CPU-bound batch transform.

You need a source that actually contains Dolby TrueHD + Atmos. Public sample sources:

A TrueHD Atmos clip is the lossless source you decode to a DAMF master; a known-good DD+ JOC Atmos clip is handy as a reference for jocprobe / emdfverify.

Toolchain: from a Blu-ray MKV to a testable stream

Tools: truehdd, ffmpeg (≥5), mkvextract / mkvmerge (MKVToolNix), this crate, and optionally Cavern (verification, below).

# 0. Build this tool.
cargo build --release                              # -> target/release/dolby-atmos-encoder

# 1. Extract the TrueHD elementary stream from the MKV.
#    Find the TrueHD track first:  ffprobe movie.mkv
ffmpeg -i movie.mkv -map 0:a:0 -c copy -f truehd movie.thd
#    (or, by track id:  mkvextract tracks movie.mkv 1:movie.thd)

# 2. Decode TrueHD -> Dolby Atmos Master (DAMF).  --presentation 3 selects the Atmos presentation.
truehdd decode --presentation 3 movie.thd --output-path master
#    -> master.atmos  +  master.atmos.audio  +  master.atmos.metadata

# 3. Render objects to a 5.1 bed, then encode an E-AC-3 core with ffmpeg.
#    Pipe raw f32le straight into ffmpeg — works for any length (no WAV size cap):
dolby-atmos-encoder downmix master.atmos --out - \
  | ffmpeg -f f32le -ar 48000 -ac 6 -i - -c:a eac3 -b:a 768k -f eac3 core.eac3
#    (Short clips only — `--out downmix.wav` then `ffmpeg -i downmix.wav ...` — but a 5.1
#     32-bit-float WAV is capped at 4 GB, i.e. ~58 min at 48 kHz, so prefer the pipe above.)

# 4. Inject per-frame Atmos metadata (OAMD + JOC + the addbsi detection flag) into the core.
dolby-atmos-encoder atmos core.eac3 master.atmos --out atmos.eac3
#    (add  --emdf-key <hex>  to sign the EMDF protection field — see "The signing seam")

# 5. Mux into an MKV alongside your video.
mkvmerge -o out.mkv --no-audio movie.mkv atmos.eac3

# 6. Quick check (full verification with Cavern below).
ffmpeg -i atmos.eac3            # expect: Audio: eac3 (... Dolby Atmos ...)

⚠️ This yields a stream ffmpeg/Cavern accept as Atmos, but it will not engage Atmos on Dolby-certified hardware — see Why it can't fully work.

Run --help on each for details.

Command	Purpose
`inspect <atmos>`	Report a DAMF master's bed/object layout and metadata.
`downmix <atmos> --out d.wav`	Render objects to a 5.1 bed WAV (`--out -` streams raw f32le to stdout — no 4 GB cap, pipe into ffmpeg).
`atmos <core> <atmos> --out o.eac3`	Inject OAMD+JOC metadata + the addbsi flag into an E-AC-3 core.
`oamd <core> <atmos> --out o.eac3`	OAMD only (no JOC).
`coregraft <realcore> <myatmos> --out o.eac3`	Splice our metadata onto a real Dolby core (diagnostic).
`graft <core> <reference> --out o.eac3`	Splice Dolby's metadata onto our core (diagnostic).
`jocprobe` / `eac3probe` / `walkprobe` / `bsidump`	Stream inspectors.
`oamddump <hex>`	Verbose field-by-field OAMD decode.
`emdfverify <input>`	Check whether our protection CRC matches a stream's stored `emdf_protection`.

Global: --emdf-key <hex|@file> and --emdf-key-id <0..7> (or DOLBY_EMDF_KEY) — see The signing seam.

Cavern is the open-source decoder we validate against: if Cavern reports objects, the OAMD/JOC metadata is well-formed. A small harness is included in tools/cavernprobe:

# Needs the .NET 8 SDK and a local Cavern checkout (adjust the path in cavernprobe.csproj).
dotnet build tools/cavernprobe -c Release
dotnet tools/cavernprobe/bin/Release/net8.0/cavernprobe.dll atmos.eac3

It prints the channel layout, HasObjects, and the decoder's full metadata (object_count, joc_num_objects, object positions). On a stream from this tool:

channels    : 6
HasObjects  : True
[JOC information]
  joc_num_objects (Number of rendered dynamic objects): ...
...
RESULT: object-based audio detected (Atmos objects present).

Exit code 0 = objects present, 2 = channel-based only. (ffmpeg ≥5's eac3 decoder also reports + Dolby Atmos as a faster first-pass sanity check.)

Run end-to-end on a full feature film — a Logan (2017) 2160p UHD Blu-ray remux, TrueHD 7.1 + Atmos (137 min) — on a single WSL2 workstation:

Stage	Result
`truehdd decode`	9,891,890 TrueHD frames → 395,675,600 samples (full 137 min), 16 GB DAMF essence
`downmix --out -` → ffmpeg	791 MB E-AC-3 core @ 768 kbps (raw-pipe path; no WAV size cap hit)
`atmos` inject	257,602 E-AC-3 frames; 13 dynamic objects + LFE; JOC avg 213 B EMDF/frame
Carriage	skip-field on 257,602 / 257,602 frames (100%); addbsi detection flag on all
Output	809 MB `atmos.eac3`
Cavern	`HasObjects=True`, 13 objects, 1 bed instance, 5-channel JOC downmix

Memory is bounded by frame count, not file size. Every subcommand streams the core through eac3::transform_frames_io (~4 MiB rolling buffer), so the working set is fixed regardless of length: that same 791 MB Logan core runs eac3inject at 7.9 MB peak RSS (a 1.25 GB synthetic core, 487,424 frames, also holds ~7.9 MB). atmos builds its JOC context from a streamed header-only scan (eac3::parse_frames_io) of the core plus a streaming pass over the CAF essence — both O(number of frames), never the core bytes — then streams the input core and output through the same path. So it no longer holds the core in RAM: an earlier build peaked at 815 MB on Logan (it loaded the whole core); it now stays in the tens of MB (≈ the 7.9 MB streaming floor plus the per-frame position/power tables). Output is byte-identical across this refactor (golden SHA-256 verified).

These are decoder-side results: ffmpeg and Cavern both accept the stream as object Atmos. It still will not engage Atmos on Dolby-certified hardware — see below for why.

Two diagnostic builds bisect the problem. Both pass ffmpeg 7 and Cavern; both fall back to Dolby Surround on real hardware:

Wall 1 — the core encoder

graft = our ffmpeg-encoded core + Dolby's genuine metadata → Dolby Surround. So the core itself is rejected even with perfect metadata. ffmpeg's E-AC-3 encoder is not Dolby-grade (e.g. it does no channel coupling); there is no open-source Dolby-conformant E-AC-3 core encoder.

Wall 2 — EMDF keyed authentication (the decisive one)

coregraft = a real Dolby core + our metadata → Dolby Surround. The metadata is rejected too, and we traced it to the EMDF emdf_protection field, which is a keyed authentication code, not a computable checksum:

The spec says so. ETSI TS 102 366 v1.4.1 §H.2.2: key_id "identifies the authentication key used to calculate the value of the protection_bits_primary and protection_bits_secondary fields," and that calculation is "implementation dependent and is not defined in the present document."
Brute force confirms it. emdfverify dumps each real frame's protection bits; an offline sweep of all 256 CRC-8 polynomials × every init/xorout/reflection over 7 candidate byte-regions across 8 real Dolby frames found ZERO matches, and no standard CRC-32 variant matched the 32-bit primary. The null result is the signature of a real secret key.
No open decoder computes it. truehdd marks the field // TODO: HMAC; Cavern and ffmpeg don't validate it at all. Only Dolby's licensed encoder (holding the key) can produce valid protection bits; certified hardware validates them.

Conclusion: an open-source encoder that produces hardware-conformant DD+ JOC Atmos is not achievable. The audio coding is solved; the format gates playback behind a proprietary cryptographic signature by design.

The signing seam (`--emdf-key`)

All of the proprietary cryptography is isolated behind one trait, emdf::EmdfProtector, with one real implementation point, emdf::dolby_keyed_mac:

PublicCrcProtector (default) — public CRC-32 / CRC-8. Well-formed but unsigned; this is the historical behaviour and what you get with no key.
KeyedProtector — selected by --emdf-key <hex> (or DOLBY_EMDF_KEY). Routes to dolby_keyed_mac, currently an HMAC-SHA256 stand-in that proves the wiring end-to-end.

To produce a signature real hardware accepts you need both, and even then a caveat:

a valid Dolby authentication key for the correct key_id, and
Dolby's actual keyed-MAC construction — exact covered byte-range, MAC algorithm, bit/byte ordering, truncation, per-key_id handling — implemented in dolby_keyed_mac. This is undocumented (the spec explicitly declines to define it). A key alone is not sufficient: we know the container structure (32-bit primary + 8-bit secondary + 3-bit key_id) but not the algorithm or which bytes it covers (our CRCs over the obvious regions do not match Dolby).
…and a valid signature still does not defeat Wall 1 for a from-scratch file — only the coregraft path (our metadata on a real Dolby core) could benefit.

This project does not contain, ship, or attempt to recover any Dolby key, and it cannot reverse a secure keyed MAC from output samples (that is the security guarantee of a MAC). The seam exists for completeness and research. Producing conformant Atmos requires Dolby's licensed encoder.

What actually works on your hardware (no encoder needed)

The real limitation was never the format — it was that some players won't bitstream TrueHD Atmos. Bypass it entirely: feed the original, untouched TrueHD Atmos straight into the AVR.

PC / media streamer (Kodi, JRiver, Shield, Zidoo…) ──HDMI──> Denon AVR (HDMI in) ──HDMI──> TV (video)

The AVR decodes the original TrueHD Atmos losslessly, with full object heights — zero conversion, zero signature problem. The only requirement is routing audio into the AVR directly rather than through the TV's eARC return.

src/
  main.rs          CLI + command dispatch (clap)
  damf.rs          DAMF (.atmos/.audio/.metadata) reader
  render.rs        Atmos objects -> 5.1 bed (VBAP)
  emdf.rs          EMDF container, OAMD encode/decode, the EmdfProtector signing seam
  joc.rs           Joint Object Coding analysis + payload writer
  joc_tables.rs    JOC quant / Huffman tables
  eac3.rs          E-AC-3 frame parse, BSI, aux/skip-field injection, crc2
  eac3_audblk.rs   A/52 audio-block bit-walker (locates the skip field)

Credits, sources & references

This project stands on open specifications and existing projects:

VoidXH/Cavern — creator Bence Sgánetz (http://en.sbence.hu). DSP/codec math for the 5.1 rendering, the OAMD/JOC decode logic, and the object gain handling was ported / adapted from Cavern's C# decoder. Cavern is released under its own non-commercial, share-alike licence — see Licensing below.
truehdd (Apache-2.0) — the Dolby TrueHD decoder that produces the DAMF (.atmos / .audio / .metadata) master this tool consumes. This crate began life as a truehdd workspace member before being extracted into its own repository.
ETSI TS 102 366 V1.4.1 (free PDF) — Digital Audio Compression (AC-3, Enhanced AC-3), including Annex H (the EMDF extensible-metadata container and the emdf_protection field).
ETSI TS 103 420 V1.2.1 (free PDF) — backwards-compatible carriage of object audio (OAMD / Joint Object Coding) in E-AC-3.
ATSC A/52:2018 (free PDF) — the AC-3 / E-AC-3 audio-block syntax underpinning the skip-field bit-walker.
FFmpeg — used as an external tool for the E-AC-3 core encode and as a reference decoder/validator (invoked as a separate process; not linked).
Dolby specifications and the Dolby Encoding Engine plugin SDK were examined for context only. No Dolby source code or keys are used or included here.

This project includes code ported / adapted from Cavern, whose licence is non-commercial and share-alike and states that any project including the code remains bound by its terms. Accordingly the entire repository is released under the Cavern licence (see LICENSE), not MIT:

✅ free to use, modify, and redistribute for free;
⛔ no selling any part of the original or a modified version;
⛔ no advertisements in the software;
🔗 Cavern (https://github.com/VoidXH/Cavern, creator http://en.sbence.hu) is credited and linked as the source — in Credits and in LICENSE;
public use (e.g. screenings) or commercial use requires the original Cavern creator's permission.

If a permissively-licensed (e.g. MIT) version is ever needed, the Cavern-derived portions must first be clean-room rewritten from the ETSI specs without consulting Cavern's source.

publish = false is set in Cargo.toml to prevent accidental release to crates.io.

"Dolby", "Dolby Atmos", "Dolby Digital Plus", and "TrueHD" are trademarks of Dolby Laboratories. This is an independent, unaffiliated interoperability / research project, not a Dolby product.