与ChatGPT合作设计稀疏音乐编解码器

与ChatGPT合作设计稀疏音乐编解码器
Co-designing a sparse music codec with ChatGPT o3

原始链接: https://akuz.me/co-designing-a-sparse-music-codec-with-chatgpt-o3-in-one-day-my-mini-pied-piper.html

作者借助ChatGPT o3作为设计伙伴，快速原型设计了一款新型电子音乐压缩器。目标是将频谱图表示为一组稀疏的可重复使用的模式，这些模式自由地放置在一个网格中，并利用相位信息进行精确定位。通过与o3的迭代对话和编码，他们制定了数学模型，实现了核心模块（模式、出现、可微分格点写入器），并创建了一个训练脚本。最初的调试是通过ASCII热图来可视化重建过程进行的。关键创新在于允许模式使用连续相位参数放置在网格上的任何位置，与传统的网格锁定方法相比，能够实现更高的压缩效率。在一天之内就实现了一个能够压缩合成测试网格的功能原型，避免了对大量设计文档或长时间编码会话的需求。快速的开发周期证明了人工智能在加速音频处理及其他领域研发方面的潜力。原型代码已上传至GitHub。

这篇 Hacker News 讨论串探讨了 ChatGPT 和 Claude 等 AI 工具对软件和硬件开发的变革性影响。用户分享了他们利用 AI 快速原型设计和完成以前认为过于耗时或超出自身技能范围的项目的经验。许多人强调了与 AI 助手结对编程从而学习新语言和框架的能力。虽然一些人告诫不要仅仅依赖 AI 来完成复杂任务，指出 AI 在处理复杂逻辑和遵循特定偏好方面存在局限性，但其他人则通过指导 AI 并迭代改进生成的代码取得了成功。一些用户强调了 AI 在加速样板代码编写和辅助处理不熟悉的语言方面的价值。

（评论） 2024-04-25

让我们写一个混响 (2021) 2024-08-21

稳定音频演示 2024-02-14

Show HN：为我的硬件合成器构建了一个用于直接声音设计的 MCP 服务器 2025-03-28

原文

For years I’ve wanted to build a super-dense electronic-music compressor: keep only the loops and phase cues that really matter, then re-synthesise the track perfectly. Evenings and weekends, however, were never long enough to design the model, write the maths, and wrangle PyTorch. Recently I opened ChatGPT running the new o3 model and treated it as a design partner. If we could keep the conversation focused, perhaps we could sketch—and prototype—the entire idea in a single stretch.

Generative Architecture Illustration

Co-designing the generative model

We started by deciding how the data should look. I wanted a phase-aware spectrogram—complex numbers on an 𝐹 × 𝑁 grid—rebuilt from a handful of reusable patterns and a sparse list of occurrences. I proposed details; o3 replied with equations. We swapped 3 × 3 windows for 5 × 5, removed global gains then re-introduced per-occurrence magnitudes, and replaced hard clamping with bilinear interpolation so gradients would flow. After several iterations we froze a checkpoint: unit-normalised patterns, fractional offsets encoded as phases, occurrences positioned by two complex numbers rather than fixed indices. o3 typeset the whole formulation in LaTeX, and I compiled it into a concise PDF.

Implementing—and debugging—the first learning loop

o3 then produced a clean repo: separate modules for patterns, occurrences, a differentiable lattice writer, and a training script. The first run showed falling loss yet every pattern remained zero. In chat we traced the issue to hard gates that silenced magnitudes before gradients could reach them; replacing the mask with soft weights solved the problem immediately, and patterns began to develop non-zero amplitudes and phases. For visibility we added a simple ASCII heat-map that printed the target spectrogram, the reconstruction, and their difference directly in the terminal.

ASCII illustrations for debugging

I initialised the data (grid of complex numbers) to a weavy pattern (ASCII reprentation of the magnitude):

Target Data

With 5000 occurrences of only 4 patterns, the algorithm was able to compress around 1/3 of the data (obviously the number of occurrences can be increased, but I decided to keep this result so that it shows how this compression is limited by the constraints of the algorithm, namely the number and size of the pattern, and the number of occurrences):

Reconstruction

The ASCII imllustration below shows the part of the data that is not described by the algorithm, due to a limited number of patterns and occurrences.

Missing part

One working day later...

By the evening the model could reconstruct a synthetic test grid with a small dictionary and far fewer occurrences than pixels. No extensive design document, no weekend-long coding marathon—just a day of iterative conversation with an AI partner. Next steps are clear: push the code to GitHub, train on real electronic tracks, and measure how low we can take the bitrate.

What makes this prototype different

The crucial detail is that occurrences are not tied to the lattice. Each centre is stored as two unit-complex numbers whose phases map to continuous coordinates, so patterns can be placed anywhere—even between grid cells—while gradients still flow. A single pattern can therefore be reused at arbitrary offsets instead of being cloned for every shift. This first experiment shows that phase-parametrised placement can turn a dense spectrogram into a sparse set of grid-free building blocks, opening the door to extremely compact music compression.

Conclusion

Working with ChatGPT o3 felt like pairing with an always-awake research colleague: every question was answered instantly, every edit compiled on the spot, and roadblocks dissolved in minutes instead of months. An experiment that had lived in my “someday” notebook for years—designing a grid-free, phase-aware music compressor—went from sketch to running prototype in a single day of dialogue and iterative coding. Turning long-standing ideas into tangible results this quickly is both liberating and a glimpse of how research will feel in the very near future. Exciting times!

See github repository here.