Suppr超能文献

生物声学基频估计:一个跨物种数据集及深度学习基线

Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline.

作者信息

Best Paul, Araya-Salas Marcelo, Ekström Axel G, Freitas Bárbara, Jensen Frants H, Kershenbaum Arik, Lameira Adriano R, Lehmann Kenna D S, Linhart Pavel, Liu Robert C, Madhavan Malavika, Markham Andrew, Roch Marie A, Root-Gutteridge Holly, Šálek Martin, Smith-Vidaurre Grace, Strandburg-Peshkin Ariana, Warren Megan R, Wijers Matthew, Marxer Ricard

机构信息

Université de Toulon, Aix Marseille Univ. CNRS, LIS, Toulon, France.

Escuela de Biologıía & Centro de Investigación en Neurociencias, Universidad de Costa Rica.

出版信息

Bioacoustics. 2025;34(4):419-446. doi: 10.1080/09524622.2025.2500380. Epub 2025 Jun 2.

Abstract

The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales ( population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.

摘要

基频(F0)是表征脊椎动物发声结构的关键参数,例如用于定义不同生物尺度(种群方言、个体特征)下的发声 repertoire 及其变化。然而,手动执行这项任务过于繁琐,并且其自动化过程很复杂。尽管在语音和音乐领域自动估计 F0 方面取得了重大进展,但生物声学领域的类似进展却很有限。为了弥补这一差距,我们汇编并发布了一个基准数据集,其中包含来自 14 个分类单元的超过 25 万个叫声,每个叫声都与真实的 F0 值配对。这些发声范围从次声到超声,从高谐波到低谐波,有些还包括非线性现象。在这些信号上测试不同的算法,我们证明了神经网络在 F0 估计方面的潜力,即使是对于训练中未出现的分类单元,或者在无标签训练的情况下。此外,为了说明算法分析信号的适用性,我们提出了与性能密切相关的 F0 质量的频谱测量方法。虽然目前的性能结果对所有研究的分类单元来说并不令人满意,但它们表明深度学习可以带来一个更通用、更可靠的生物声学 F0 跟踪器,帮助该领域通过 F0 轮廓分析发声。

相似文献

1
Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline.
Bioacoustics. 2025;34(4):419-446. doi: 10.1080/09524622.2025.2500380. Epub 2025 Jun 2.
4
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.
Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.
5
The Black Book of Psychotropic Dosing and Monitoring.
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
10
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.

本文引用的文献

1
Individual identity information persists in learned calls of introduced parrot populations.
PLoS Comput Biol. 2023 Jul 27;19(7):e1011231. doi: 10.1371/journal.pcbi.1011231. eCollection 2023 Jul.
3
Deep audio embeddings for vocalisation clustering.
PLoS One. 2023 Jul 10;18(7):e0283396. doi: 10.1371/journal.pone.0283396. eCollection 2023.
4
The evolution of sexually dimorphic traits in ecological gradients: an interplay between natural and sexual selection in hummingbirds.
Proc Biol Sci. 2022 Dec 21;289(1989):20221783. doi: 10.1098/rspb.2022.1783. Epub 2022 Dec 14.
5
Long-distance vocalizations of spotted hyenas contain individual, but not group, signatures.
Proc Biol Sci. 2022 Jul 27;289(1979):20220548. doi: 10.1098/rspb.2022.0548. Epub 2022 Jul 20.
6
Computational bioacoustics with deep learning: a review and roadmap.
PeerJ. 2022 Mar 21;10:e13152. doi: 10.7717/peerj.13152. eCollection 2022.
7
Maturation of Social-Vocal Communication in Prairie Vole () Pups.
Front Behav Neurosci. 2022 Jan 11;15:814200. doi: 10.3389/fnbeh.2021.814200. eCollection 2021.
8
Performance Evaluation of Subharmonic-to-Harmonic Ratio (SHR) Computation.
J Voice. 2021 May;35(3):365-375. doi: 10.1016/j.jvoice.2019.11.005. Epub 2020 Mar 9.
9
Fundamental Frequency Estimation of Low-quality Electroglottographic Signals.
J Voice. 2019 Jul;33(4):401-411. doi: 10.1016/j.jvoice.2018.01.003. Epub 2018 May 31.
10
An intra-population analysis of the indris' song dissimilarity in the light of genetic distance.
Sci Rep. 2017 Aug 31;7(1):10140. doi: 10.1038/s41598-017-10656-9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验