神经吉他效果器 – 优化 Daisy Seed Arm Cortex-M7 的 NAM

神经吉他效果器 – 优化 Daisy Seed Arm Cortex-M7 的 NAM
Running Neural Amp Modeler on embedded hardware

原始链接: https://www.tone3000.com/blog/running-nam-on-embedded-hardware

神经放大器建模 (NAM) 正在扩展到桌面之外，应用于嵌入式系统，如吉他效果器和单板计算机。为了准备下一代架构 (A2)，开发者研究了在资源受限硬件上运行 NAM 的挑战，使用了 Electrosmith Daisy Seed 平台。初步测试显示出显著的性能问题——处理 2 秒的音频需要超过 5 秒。核心问题是模型大小、低效的线性代数（使用 Eigen）以及在没有操作系统的情况下难以加载标准的 NAM 文件 (.nam)。为了解决这些问题，他们针对 NAM 使用的小矩阵大小优化了 Eigen，开发了一种紧凑的二进制模型格式 (.namb) 以方便加载（通过配套应用程序转换），并利用了更小的模型变体 (A1-nano) 以及 ReLU 激活函数。这些改进将处理时间减少到大约 1.5 秒，为额外的效果留出了空间。这次实践经验直接影响了 A2 的设计，特别是他们的“可瘦身 NAM”方法，允许模型适应不同的硬件能力。该项目的全部源代码正在公开发布。

黑客新闻新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交登录神经吉他效果器 – 为雏菊种子 Arm Cortex-M7 优化 NAM (tone3000.com) 8 分， woodybury 2 小时前 | 隐藏 | 过去的 | 收藏 | 2 条评论帮助 woodybury 2 小时前 | 下一个 [–] 我们经历了将神经放大器建模 DSP 优化为在微小的雏菊种子 Arm Cortex-M7 上实时运行推理的激动人心的过程。想想为小矩阵手动编写 GEMM 内核和其他有趣的东西 :D 回复 wedemboys 1 小时前 | 上一个 [–] 凝视汇编输出的乐趣回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

NAM has come a long way from its origins as a desktop plugin. Today it runs on single-board computers, guitar pedals like the Darkglass Anagram, and even web browsers. As we work on Architecture 2 — our next-generation NAM architecture built specifically to run on more hardware — we wanted to go hands-on and understand exactly what embedded NAM deployment looks like in practice.

So we built a NAM loader for the Electrosmith Daisy Seed: an ARM Cortex-M7 board that's become a popular foundation for DSP-based audio products, from eurorack modules to commercial guitar pedals.

Here's what we found...

The challenge: NeuralAmpModelerCore wasn't designed for this

The NeuralAmpModelerCore library has been battle-tested in the NAM plugin — but that plugin runs on a desktop with gigabytes of RAM, an operating system, and no hard deadline on how long processing can take. Embedded hardware is a completely different world: tight memory limits, no OS, and a strict real-time budget that audio simply cannot exceed.

When we first ran our implementation on the Daisy Seed, using a model small enough to fit on the device (A1-Nano with the tanh activation replaced by ReLU) , processing 2 seconds of audio took over 5 seconds of compute time. For a guitar pedal, that's obviously a non-starter.

The problems broke down into three areas: model size (neural networks carry a memory footprint that needs to fit within embedded constraints), compute efficiency (the linear algebra library the codebase relies on wasn't optimized for the small matrix sizes NAM actually uses), and model loading (parsing the standard .nam JSON format on a device with no OS and very limited RAM is harder than it sounds).

What we did about it

We started by profiling the code to understand exactly where time was being spent, rather than guessing. The main bottleneck turned out to be Eigen — specifically, how it handles matrix multiplications for small, fixed-size matrices, which is exactly what NAM inference uses. We added specialized routines tuned to the matrix sizes that actually appear in NAM models, alongside several other targeted improvements.

For the loading problem, we developed a new compact binary model format (which we called “.namb”) designed as a drop-in alternative to .nam for use on embedded devices. The idea is that a companion app — running on your phone or desktop — converts .nam files from the TONE3000 library into the compact format, then transfers them to the device over Bluetooth or USB. No model conversion, no quality loss, just a leaner representation that works within the device's constraints.

For this initial exploration, we used a smaller model variant (A1-nano) with a ReLU activation function in place of tanh, a well-established swap that cuts compute cost significantly. (If you want the full technical breakdown of all of this, including what we tried that didn't work, check out our engineering post.)

Results

After optimization, the same model that originally took over 5 seconds to process 2 seconds of audio now runs in approximately 1.5 seconds — with compute headroom left over for effects processing before and after the NAM block.

More importantly, this work gave us a concrete picture of the embedded NAM problem that directly feeds into A2 design. Slimmable NAM — our approach to letting a single model adapt its compute requirements to the hardware it's running on — is a direct result of what we learned from our partners and observed in this experiment.

If you are interested in more details on these experiments, we are publishing all of the source code that was developed for them, including:

Merging the numerical optimizations directly into NeuralAmpModelerCore
Publishing the nam-binary-loader tools and library
Publishing the example Daisy code as a blueprint for the development of other NAM pedals at nam-pedal
Detailed discussion of the microbenchmarks, (what worked, what didn't work) at João's blog