逆向工程 Creative Katana 条形音箱以实现在 Linux 下对其进行控制

逆向工程 Creative Katana 条形音箱以实现在 Linux 下对其进行控制
Reverse engineering the Creative Katana soundbar to control it from Linux

原始链接: https://blog.nns.ee/2026/02/20/katana-v2x-re/

作为一名 Linux 用户，作者通过逆向工程破解了 Creative Sound Blaster Katana V2X 音箱，从而实现了对均衡器（EQ）和 LED 灯效等功能的控制——这些功能通常仅限于 Windows 应用程序使用。通过使用 Wireshark 捕获 USB 流量，并利用 dnSpy 和 Ghidra 分析 Creative 应用程序的 DLL 文件，作者解析了该设备的通信协议。他们发现该音箱通过 CDC ACM 串口接口进行通信，并采用特定的“5A”指令帧。此外，他们还成功绕过了一项独特但非传统的 AES-256-GCM 质询-响应认证机制，该机制是解锁设备控制权的必要条件。除了基础设置外，作者还对设备的固件升级过程进行了逆向，识别出专有的“CIFF”容器格式及其内部组件，包括引导加载程序、主固件和资源文件。这些研究成果最终促成了开源 Rust 工具 `v2x-ctl` 的诞生，使 Linux 用户无需依赖专有软件即可控制 Katana V2X 硬件。作者计划在未来的项目中进一步探索该设备的 ARM 架构固件。

```Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 投稿登录逆向工程 Creative Katana 条形音箱以实现在 Linux 下对其进行控制 (nns.ee) 6 点由 theanonymousone 在 1 小时前发布 | 隐藏 | 过往 | 收藏 | 讨论 | 帮助指南 | 常见问题 | 列表 | API | 安全 | 法律 | 加入 YC | 联系搜索：```

I recently purchased a Creative Sound Blaster Katana V2X soundbar (what a mouthful) to replace my old, cheap Logitech computer speakers. They served me well, but listening to music or watching movies was not the best-sounding experience.

After arriving, I set it up and realized it had an USB port which, aside from being able to use it as an audio input, allows the user to configure the speaker: Set the EQ, set the LED lights in different modes, etc. The unfortunate part of this was the fact that it requires the (proprietary) Creative App to use. What's more, it only seems to be available for Windows, which I don't use. While using it in a VM worked, it was hardly convenient.

This seemed like the perfect opportunity for something I love: Reverse engineering proprietary applications, devices and protocols and writing tools to communicate with them.

Initial recon

From just looking at the directory where the Creative App was installed, I could tell this was a .NET app. They usually have a fairly large amount of DLLs Named.Like.This.dll, each corresponding to a C# module. The .exe.config file is also a giveaway.

My suspicion was confirmed when I loaded the exe and corresponding DLLs up in dnSpy, a .NET disassembler. Unfortunately, I also realized that a large portion of the modules were obfuscated and fairly hard to read.

Deciding to leave this aside for now, I turned my focus on the USB comms themselves. Having no clue how the speaker even communicated with the app, I started recording all USB traffic with Wireshark and USBPcap. I did this before even opening the app, as I wanted to capture as much communication as possible.

The first thing the application told me when it found my soundbar was that it needed a firmware upgrade. I let it upgrade, and inspected the USBPcap output. The actual firmware update payload was easily recognizable, as the packets were much larger than any surrounding packets, and fortunately it seemed to be a plaintext firmware blob!

I did write a script to extract the entire firmware file from the packet capture - more on this later.

Reverse engineering the protocol

In order to have captures of everything the application lets the user do, I methodologically started going through each of the options, clicking things, changing things, and creating a separate capture file for each operation. This took me around an entire day and resulted in ~100 different captures.

This allowed me to analyze the packets, write down notes on what does what, and after a while I had a pretty clear picture of how the protocol works.

The communication happens over the CDC ACM serial interface, and the speaker actually exposes itself on Linux over /dev/ttyACM*.

All of the proprietary commands use a simple framing:

5A [cmd] [len] [payload...]

The 0x5A is always static and is likely just the command start marker. cmd is the command opcode for whatever you're trying to do. len, as the name suggests, is the number of payload bytes following and payload is the payload (or subcommand) itself.

Responses are fairly similar, usually with a byte indicating it's a response.

I won't go over all of the different commands, but as an example, here's an example command for requesting the current FW version (as a subcommand) and the corresponding response:

Host -> Device:  5a 09 01 02
Device -> Host:  5a 09 12 02 10 "1.3.230619.1820\0"

Authentication

Before you're even able to send commands, you're supposed to pass a challenge-response authentication to put the device in a mode where it accepts commands. I'm not really sure why this was done - maybe Creative really doesn't want people using third party applications to control the devices they own? In any case, I reverse engineered this as well.

From the first capture, I could see that one of the first comms between the device and the host were as follows:

Host -> Device:  "whoareyou.MyApp8\r\n"
Device -> Host:  "whoareyou" 1e 04 83 32 [32 random bytes] "\r\n"
Host -> Device:  "unlock" [64 random bytes] "\r\n"
Device -> Host:  "unlock_OK\r\n"
Host -> Device:  "SW_MODE1\r\n"
[... binary comms ...]

This seemed to be some sort of challenge-response, and the 64 random bytes made me initially consider a simple (HMAC-)SHA512. However, searching through the assemblies in dnSpy for anything calling SHA512 (or any hashing algos, for that matter) didn't come up with anything that seemed relevant. In fact, even searching simply for the string whoareyou came up with nothing. Taking a step back, I ran a grep whoareyou on all of the files, and found out that only the binary DLL CTCDC.dll matched.

Loading this up in Ghidra and going through the X-refs, I ended up on the function that seemed to be responsible for the initial communication with the device, as evident by checking for responses such as Unknown command and NotYet.

Analyzing this function, I was able to deduce that it wasn't using SHA at all, but rather some weird AES-256-GCM based authentication.

The challenge message format:

whoareyou [1E 04] [83 32] [32-byte nonce] \r\n
           │       └─ Device type (USB PID 0x3283 LE)
           └─ Challenge header

The application encrypts the device's 32-byte nonce using AES-256-GCM using the following key:

1e 04 d3 1a 21 27 9b e3 46 f0 99 9d 6e c4 c3 fe
be 98 90 18 69 c1 18 fb b1 25 6e 0c e0 7b 83 32

The key itself isn't stored in the DLL directly, but is constructed from the challenge message itself and some static data in the DLL:

Bytes 0-1: challenge header (1E 04)
Bytes 2-3: DLL static (D3 1A)
Bytes 4-27: DLL static (24 bytes from 0x101dba78)
Bytes 28-29: DLL static (E0 7B)
Bytes 30-31: USB PID bytes from challenge (83 32)

Since the challenge header is device-constant, the key is effectively hardcoded for this specific device, but I imagine this challenge-response mechanism is shared with other devices, where the key would differ.

The response is computed as so:

Generate 16 random bytes for the iv value
Use iv[0:12] as the GCM nonce
Encrypt the 32-byte challenge nonce: (ciphertext, tag) = AES-256-GCM(key, iv[:12], nonce)
Response = "unlock" + iv + ciphertext + tag + "\r\n"

This is fairly unusual - typically, the tool for proving that you know a shared secret is HMAC. I'm not sure why Creative felt the need to jump through so many hoops to make something that achieves essentially the same thing. This encryption scheme provides integrity and confidentiality, but the latter seems pointless here, as the nonce is already known to both sides. Only the integrity proof matters. Maybe I'm missing something here, but it just seems strange overall.

v2x-ctl

Having pretty much all of the missing pieces I needed, I was able to create a Rust library and CLI application called v2x-ctl (or simply v2x).

If you happen to have a Katana V2X and want to be able to control its settings from Linux, give it a try! It was made on a best-effort basis, I made sure everything more-or-less worked, but only on the latest FW version (1.3.230619.1820). It's also entirely possible that the application would theoretically work for other Sound Blaster devices as well, but not without at least modifying the challenge encryption key and some of the IDs I've currently hardcoded. In any case, if you happen to be interested and try it out, let me know how it goes.

Circling back to the firmware upgrade capture, I could deduce the following packet structure for each of the payload packets:

[0:2]  5b 98         - start marker
[2:4]  remaining_len - u16le, length of everything after this field
[4]    04            - command (firmware data write)
[5]    seq           - sequence counter (resets every 32 packets)
[6:8]  payload_len   - u16le, length of firmware data in this packet

Knowing this, I wrote a script that extracted the data from the capture using tshark.

I was left with a file identifying itself as CIFF, which supposedly stands for "Creative Image File Format", and is a container with different sections. The file I had had four types of sections: CINF (the device info, an UTF-16LE string), CIN2 (version info), DATA (the firmware binary itself), and CHK2 (the checksum).

Specifically, the CIFF format itself seems to be:

Offset	Size	Description
`0x00`	4	Magic: `CIFF`
`0x04`	4	u32 payload_size (everything after this field, up to but NOT including `CHK2`)
`0x08`	...	Sections (`CINF`, `CIN2`, `DATA`..., `CHK2`)

Each section follows the same TLV envelope:

Offset	Size	Description
`0x00`	4	Magic
`0x04`	4	u32 section_size (bytes after this field)
`0x08`	N	Section payload

In the firmware file I extracted, there were multiple DATA sections with two different sub-types.

The first one is the F-type, which I named so because its name starts with F, which I assume stands for firmware. The name is a null-terminated UTF-16LE string, padded to 32 bytes, followed by the raw binary data.

The second type is the H-type, for a similar reason, but I don't really know what the H stands for in this case, maybe host resource? In any case, the name for this type is padded to 512 bytes, not just 32, but otherwise follows the same structure.

The CHK2 section is a 32-byte SHA256 hash computed over all bytes between the CIFF header (after offset 0x08) to the start of the CHK2 section, covering everything inbetween.

As an overview, here's what the firmware container I extracted looks like:

#	Magic	Name	Offset	Size	Content
0	`CINF`	-	`0x0008`	96	"Creative MarvelX One"
1	`CIN2`	-	`0x0070`	12	Version data
2	`DATA`	`FBOOT`	`0x0084`	231,208	ARM32 bootloader
3	`DATA`	`FMAIN`	`0x387B4`	1,486,904	ARM32 main firmware (FreeRTOS)
4	`DATA`	`Hres/audio/audioprompts-en.pkg`	`0x1A37F4`	41,472	Audio prompts (Opus)
5	`DATA`	`Hbin/marvelX-malcolm.bin`	`0x1AD9FC`	291,430	8051 MCU firmware
6	`CHK2`	-	`0x1F4C6A`	32	SHA-256 checksum

The BOOT and MAIN firmwares are interesting and I will be taking a look at them next.