用于发声聚类的深度音频嵌入。

Deep audio embeddings for vocalisation clustering.

机构信息

Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France.

出版信息

PLoS One. 2023 Jul 10;18(7):e0283396. doi: 10.1371/journal.pone.0283396. eCollection 2023.

DOI:10.1371/journal.pone.0283396

PMID:37428759

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10332598/

Abstract

The study of non-human animals' communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased. This motivates computerised assistance for this procedure, for which machine learning algorithms represent a good opportunity. Unsupervised clustering algorithms are suited for grouping close points together, provided a relevant representation. This paper therefore studies a new method for encoding vocalisations, allowing for automatic clustering to alleviate vocal repertoire characterisation. Borrowing from deep representation learning, we use a convolutional auto-encoder network to learn an abstract representation of vocalisations. We report on the quality of the learnt representation, as well as of state of the art methods, by quantifying their agreement with expert labelled vocalisation types from 8 datasets of other studies across 6 species (birds and marine mammals). With this benchmark, we demonstrate that using auto-encoders improves the relevance of vocalisation representation which serves repertoire characterisation using a very limited number of settings. We also publish a Python package for the bioacoustic community to train their own vocalisation auto-encoders or use a pretrained encoder to browse vocal repertoires and ease unit wise annotation.

摘要

对非人类动物交流系统的研究通常依赖于使用有限的离散单元对声音序列进行转录。这个集合被称为声音曲目，它是特定于一个物种或一个物种的亚群的。当由人类专家进行时，对声音曲目进行正式描述可能是费力的和/或有偏见的。这就促使人们寻求计算机辅助来完成这个过程，而机器学习算法就是一个很好的机会。无监督聚类算法适合将接近的点聚在一起，前提是有一个相关的表示。因此，本文研究了一种新的声音编码方法，允许自动聚类以减轻声音曲目特征描述的负担。借鉴深度学习，我们使用卷积自动编码器网络来学习声音的抽象表示。我们报告了所学到的表示的质量，以及最先进的方法，通过量化它们与来自 6 个物种（鸟类和海洋哺乳动物）的 8 个其他研究数据集的专家标记声音类型的一致性来实现。通过这个基准，我们证明了使用自动编码器可以提高声音表示的相关性，从而在使用非常有限的设置的情况下对曲目特征描述进行服务。我们还为生物声学社区发布了一个 Python 包，用于训练他们自己的声音自动编码器，或使用预训练的编码器来浏览声音曲目并轻松进行单元注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2f8/10332598/536dcda6d65b/pone.0283396.g001.jpg

相似文献

Deep audio embeddings for vocalisation clustering.用于发声聚类的深度音频嵌入。

PLoS One. 2023 Jul 10;18(7):e0283396. doi: 10.1371/journal.pone.0283396. eCollection 2023.

Deep multi-kernel auto-encoder network for clustering brain functional connectivity data.深度多核自动编码器网络在脑功能连接数据聚类中的应用。

Neural Netw. 2021 Mar;135:148-157. doi: 10.1016/j.neunet.2020.12.005. Epub 2020 Dec 11.

Cross-species parallels in babbling: animals and algorithms.咿呀学语的跨物种相似性：动物与算法。

Philos Trans R Soc Lond B Biol Sci. 2021 Oct 25;376(1836):20200239. doi: 10.1098/rstb.2020.0239. Epub 2021 Sep 6.

The acoustic repertoire and behavioural context of the vocalisations of a nocturnal dasyurid, the eastern quoll (Dasyurus viverrinus).一种夜行性袋鼬科动物——东部袋鼬（Dasyurus viverrinus）发声的声学特征及行为背景。

PLoS One. 2017 Jul 7;12(7):e0179337. doi: 10.1371/journal.pone.0179337. eCollection 2017.

Global birdsong embeddings enable superior transfer learning for bioacoustic classification.全球鸟鸣嵌入能够实现生物声学分类的卓越迁移学习。

Sci Rep. 2023 Dec 18;13(1):22876. doi: 10.1038/s41598-023-49989-z.

Stacked Convolutional Denoising Auto-Encoders for Feature Representation.堆叠卷积去噪自编码器的特征表示。

IEEE Trans Cybern. 2017 Apr;47(4):1017-1027. doi: 10.1109/TCYB.2016.2536638. Epub 2016 Mar 16.

Avian vocal mimicry: a unified conceptual framework.鸟类声音模仿：统一的概念框架。

Biol Rev Camb Philos Soc. 2015 May;90(2):643-68. doi: 10.1111/brv.12129. Epub 2014 Jul 30.

BioCPPNet: automatic bioacoustic source separation with deep neural networks.BioCPPNet：基于深度神经网络的生物声学源分离

Sci Rep. 2021 Dec 6;11(1):23502. doi: 10.1038/s41598-021-02790-2.

Detection of ground parrot vocalisation: A multiple instance learning approach.地面鹦鹉发声的检测：一种多示例学习方法。

J Acoust Soc Am. 2017 Sep;142(3):1281. doi: 10.1121/1.4999318.

Automatic recording of individual oestrus vocalisation in group-housed dairy cattle: development of a cattle call monitor.群体饲养奶牛个体发情发声的自动记录：奶牛叫声监测器的研制。

Animal. 2020 Jan;14(1):198-205. doi: 10.1017/S1751731119001733. Epub 2019 Aug 1.

引用本文的文献

Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline.生物声学基频估计：一个跨物种数据集及深度学习基线

Bioacoustics. 2025;34(4):419-446. doi: 10.1080/09524622.2025.2500380. Epub 2025 Jun 2.

A large annotated dataset of vocalizations by common marmosets.普通狨猴发声的大型注释数据集。

Sci Data. 2025 May 13;12(1):782. doi: 10.1038/s41597-025-04951-8.

Visualization and quantification of coral reef soundscapes using CoralSoundExplorer software.使用珊瑚礁声音探测软件对珊瑚礁声景进行可视化和量化分析。

PLoS Comput Biol. 2025 Apr 10;21(4):e1012050. doi: 10.1371/journal.pcbi.1012050. eCollection 2025 Apr.

A first vocal repertoire characterization of long-finned pilot whales () in the Mediterranean Sea: a machine learning approach.地中海长鳍领航鲸（）的首个发声曲目特征描述：一种机器学习方法。

R Soc Open Sci. 2024 Nov 6;11(11):231973. doi: 10.1098/rsos.231973. eCollection 2024 Nov.

Automatic detection for bioacoustic research: a practical guide from and for biologists and computer scientists.生物声学研究中的自动检测：面向生物学家和计算机科学家的实用指南

Biol Rev Camb Philos Soc. 2025 Apr;100(2):620-646. doi: 10.1111/brv.13155. Epub 2024 Oct 17.

SqueakOut: Autoencoder-based segmentation of mouse ultrasonic vocalizations.SqueakOut：基于自动编码器的小鼠超声波发声分割

bioRxiv. 2024 Apr 23:2024.04.19.590368. doi: 10.1101/2024.04.19.590368.

Soundscape Characterization Using Autoencoders and Unsupervised Learning.使用自动编码器和无监督学习的音景特征描述

Sensors (Basel). 2024 Apr 18;24(8):2597. doi: 10.3390/s24082597.

Vocal complexity in a socially complex corvid: gradation, diversity and lack of common call repertoire in male rooks.一种社会行为复杂的鸦科鸟类的发声复杂性：雄性白嘴鸦叫声的渐变、多样性及缺乏共同的叫声曲目

R Soc Open Sci. 2024 Jan 10;11(1):231713. doi: 10.1098/rsos.231713. eCollection 2024 Jan.

本文引用的文献

Bird song comparison using deep learning trained from avian perceptual judgments.使用基于鸟类感知判断训练的深度学习进行鸟鸣比较。

PLoS Comput Biol. 2024 Aug 7;20(8):e1012329. doi: 10.1371/journal.pcbi.1012329. eCollection 2024 Aug.

A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations.动物发声的无监督、声谱图为基础的潜在空间表示的实用指南。

J Anim Ecol. 2022 Aug;91(8):1567-1581. doi: 10.1111/1365-2656.13754. Epub 2022 Jun 9.

Computational bioacoustics with deep learning: a review and roadmap.深度学习的计算生物声学：综述与路线图。

PeerJ. 2022 Mar 21;10:e13152. doi: 10.7717/peerj.13152. eCollection 2022.

Automated annotation of birdsong with a neural network that segments spectrograms.使用对声谱图进行分割的神经网络自动标注鸟鸣。

Elife. 2022 Jan 20;11:e63853. doi: 10.7554/eLife.63853.

All units are equal in humpback whale songs, but some are more equal than others.所有单位在座头鲸的歌声中都是平等的，但有些比其他的更平等。

Anim Cogn. 2022 Feb;25(1):149-177. doi: 10.1007/s10071-021-01539-8. Epub 2021 Aug 6.

Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires.低维习得特征空间定量个体和群体在声音曲目上的差异。

Elife. 2021 May 14;10:e67855. doi: 10.7554/eLife.67855.

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires.发现、可视化和量化不同动物声谱中的潜在结构。

PLoS Comput Biol. 2020 Oct 15;16(10):e1008228. doi: 10.1371/journal.pcbi.1008228. eCollection 2020 Oct.

Ultrasonic signals associated with different types of social behavior of mice.与小鼠不同类型社会行为相关的超声信号。

Nat Neurosci. 2020 Mar;23(3):411-422. doi: 10.1038/s41593-020-0584-z. Epub 2020 Feb 17.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

Zebra finches identify individuals using vocal signatures unique to each call type.斑马雀通过每种叫声类型特有的声音签名来识别个体。

Nat Commun. 2018 Oct 2;9(1):4026. doi: 10.1038/s41467-018-06394-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于发声聚类的深度音频嵌入。

Deep audio embeddings for vocalisation clustering.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献