Show HN：Resonate – 实时高时间分辨率光谱分析

Show HN：Resonate – 实时高时间分辨率光谱分析
Show HN: Resonate – real-time high temporal resolution spectral analysis

原始链接: https://alexandrefrancois.org/Resonate/

Resonate 是一种用于音频信号实时频谱分析的新方法，它为基于 FFT 的频谱图提供了高效的替代方案。它使用一组谐振器，每个谐振器都调谐到特定频率，并使用指数加权移动平均 (EWMA) 累积信号能量。EWMA 更重视最近的输入，模拟了人类的感知。每个谐振器的状态由一个复数表示，并通过简单的公式迭代更新，所需计算和内存极少。单个参数 alpha 控制谐振器的动态特性，并且可以依赖于频率。可选地，额外的 EWMA 可以平滑谐振器的状态。与 FFT 不同，Resonate 允许灵活的、感知相关的频率刻度，并提供优越的时间分辨率。它计算效率高，与谐振器的数量线性缩放，并且易于并行化。Python、C++ 和 Swift 的开源实现已经可用，演示了实时频谱图生成和音频特征提取。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Show HN: Resonate – 实时高时间分辨率光谱分析 (alexandrefrancois.org) arjf 35分钟前 6 分 | 隐藏 | 过去 | 收藏 | 1 评论 james_a_craig 9分钟前 [–] 由于某种原因，C++代码中给出的π值是错误的！源代码中给出的是 3.14159274101257324219，而相同位数的正确值是 3.14159265358979323846。非常奇怪。我注意到这一点时，我去查看C++代码以了解该算法的实际实现方式。https://github.com/alexandrefrancois/noFFT/blob/main/src/Res... 第31行。回复加入我们，参加6月16日至17日在旧金山举行的AI创业学校！指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系我们搜索：

让我们写一个混响 (2021) 2024-08-21

（评论） 2024-04-08

FFT反击：自我注意的有效替代品 2025-02-27

(评论) 2025-04-09

原文

Overview

Resonate builds on a resonator model that accumulates the signal contribution around its resonant frequency in the time domain using the Exponentially Weighted Moving Average (EWMA), also known as a low-pass filter in signal processing. Consistently with on-line perceptual signal analysis, the EWMA gives more weight to recent input values, whereas the contributions of older values decay exponentially. A compact, iterative formulation of the model affords computing an update at each signal input sample, requiring no buffering and involving only a handful of arithmetic operations.

Each resonator, characterized by its resonant frequency \(f = \frac{\omega}{2\pi}\), is described by a complex number \(R\) whose amplitude captures the contribution of the input signal component around frequency \(f\). The formulas below capture the recursive update for \(R\) by way of a phasor \(P\), applied for each sample \(x\) of a real-valued input signal \(x(t) \in [-1,1]\), regularly sampled at sampling rate \(sr\). \(\Delta t=1/sr\) is the sample duration, and \(\alpha \in [0,1]\) is a constant parameter that dictates how much each new measurement affects the accumulated value.

\[P \leftarrow P e^{-i \omega \Delta t}\] \[R \leftarrow (1-\alpha) R + \alpha x P\]

The two complex numbers \(P\) and \(R\) capture the full state of the resonator. Updating the state at each input signal sample only requires a handful of arithmetic operations. Calculating the power and/or magnitude is not necessary for the update, and can be carried out only when required by the application, relatively efficiently as well. The single parameter \(\alpha\), which can be related to a time constant, governs the dynamics of the system. For the frequency range of interest in audio applications (20-20000 Hz), the function \(\alpha_f = 1-e^{-\Delta t\frac{f}{log(1+f)} }\) is a reasonable heuristic. The smoothed state \(\tilde{R}\) is produced by applying the EMWA to \(R\) with the same \(\alpha\) to dampen power and phase oscillations. Finally, the output of each resonator is optionally normalized by the total response across the bank to a step signal of the resonator’s frequency (equalization).

Banks of resonators, independently tuned to perceptually relevant frequency scales, compute an instantaneous, perceptually relevant estimate of the spectral content of an input signal in real-time. Both memory and per-sample computational complexity of such a bank are linear in the number of resonators, and independent of the number of input samples processed, or duration of processed signal. Furthermore, since the resonators are independent, there is no constraint on the tuning of their resonant frequencies or time constants, and all per sample computations can be parallelized across resonators. In an offline processing context, the cumulative computational cost for a given duration increases linearly with the number of input samples processed.

Spectrograms

Spectral information as a function of time is typically presented graphically for human consumption in the form of a spectrogram, in which the horizontal axis represents time and the vertical axis represents frequency. The value at each point represents the power of the frequency in the input signal at the given time slice. These values are usually normalized by the maximum value over the signal, and mapped to a logarithmic color scale to produce plots like those shown below. A Resonate oscillator bank with adequately tuned resonators computes an arbitrary frequency scale spectrogram directly and efficiently, with more relevant frequency resolution and much higher temporal resolutiont than FFT-based methods.

Log-frequency scale spectrograms Log-frequency power spectrograms of Librosa's vibeace music example, computed from the constant-Q transform (CQT) and from a Resonate implementation (spectrogram display and CQT from Librosa, sampling rate: 22050Hz, hop length: 512 samples, 100 frequency bins from 32.7Hz to 9955.1Hz, 12 bins per octave).

Mel-frequency scale spectrograms Mel-frequency power spectrograms of Librosa's Libri3 speech sample, computed from the constant-Q transform (CQT) and from a Resonate implementation (spectrogram display and CQT from Librosa, sampling rate: 22050Hz, hop length: 32 samples, 128 frequency bins from 0 to 8000Hz).

Publications

Alexandre R.J. François, “Resonate: Efficient Low Latency Spectral Analysis of Audio Signals,” to appear in Proceedings of the 50th Anniversary of the International Computer Music Conference 2025, Boston, MA, USA, 8-14 June 2025.

Resources

The open source python module noFFT provides python and C++ implementations of Resonate functions and Jupyter notebooks illustrating their use in offline settings.
The open source Oscillators Swift package contains reference implementations in Swift and C++. The Oscillators app demonstrates real-time spectrograms and derived audio features.
The Resonate Youtube playlist features video captures of real-time demonstrations.