Best Paul, Araya-Salas Marcelo, Ekström Axel G, Freitas Bárbara, Jensen Frants H, Kershenbaum Arik, Lameira Adriano R, Lehmann Kenna D S, Linhart Pavel, Liu Robert C, Madhavan Malavika, Markham Andrew, Roch Marie A, Root-Gutteridge Holly, Šálek Martin, Smith-Vidaurre Grace, Strandburg-Peshkin Ariana, Warren Megan R, Wijers Matthew, Marxer Ricard
Université de Toulon, Aix Marseille Univ. CNRS, LIS, Toulon, France.
Escuela de Biologıía & Centro de Investigación en Neurociencias, Universidad de Costa Rica.
Bioacoustics. 2025;34(4):419-446. doi: 10.1080/09524622.2025.2500380. Epub 2025 Jun 2.
The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales ( population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.
基频(F0)是表征脊椎动物发声结构的关键参数,例如用于定义不同生物尺度(种群方言、个体特征)下的发声 repertoire 及其变化。然而,手动执行这项任务过于繁琐,并且其自动化过程很复杂。尽管在语音和音乐领域自动估计 F0 方面取得了重大进展,但生物声学领域的类似进展却很有限。为了弥补这一差距,我们汇编并发布了一个基准数据集,其中包含来自 14 个分类单元的超过 25 万个叫声,每个叫声都与真实的 F0 值配对。这些发声范围从次声到超声,从高谐波到低谐波,有些还包括非线性现象。在这些信号上测试不同的算法,我们证明了神经网络在 F0 估计方面的潜力,即使是对于训练中未出现的分类单元,或者在无标签训练的情况下。此外,为了说明算法分析信号的适用性,我们提出了与性能密切相关的 F0 质量的频谱测量方法。虽然目前的性能结果对所有研究的分类单元来说并不令人满意,但它们表明深度学习可以带来一个更通用、更可靠的生物声学 F0 跟踪器,帮助该领域通过 F0 轮廓分析发声。