未标记动物声音事件的深度感知嵌入。

Deep perceptual embeddings for unlabelled animal sound events.

机构信息

Machine Listening Lab, Centre for Digital Music (C4DM), Department of Electronic Engineering, Queen Mary University of London, London, United Kingdom.

Department of Psychology, Royal Holloway University of London, London, United Kingdom.

出版信息

J Acoust Soc Am. 2021 Jul;150(1):2. doi: 10.1121/10.0005475.

DOI:10.1121/10.0005475

PMID:34340499

Abstract

Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.

摘要

评估声音相似性是听觉感知和计算分析的基本组成部分。传统的基于数据的感知相似性分析基于启发式或简化的线性模型，因此受到限制。深度学习嵌入，通常使用三元网络，在许多领域都很有用。然而，这种网络通常是使用带有类别标签的大型数据集进行训练的。但这些标签并不总是可以获取的。当没有类别标签时，我们探索用于声音事件表示的数据驱动神经嵌入，而是利用感知相似性判断的代理。最终，我们的目标是创建一个反映动物对声音感知的感知嵌入空间。我们使用三元模型为鸟类声音创建深度感知嵌入。为了解决在缺乏类别标签数据的情况下进行三元损失训练的挑战性问题，我们利用多维尺度（MDS）预训练、注意力池化和三元挖掘方案。我们还评估了三元学习相对于从仅基于 MDS 训练的模型学习神经嵌入的优势。使用相似性判断的计算代理，我们展示了该方法的可行性，即基于行为判断为广泛的数据开发感知模型，帮助我们了解动物如何感知声音。

相似文献

Deep perceptual embeddings for unlabelled animal sound events.

J Acoust Soc Am. 2021 Jul;150(1):2. doi: 10.1121/10.0005475.

Learning to Evaluate Color Similarity for Histopathology Images using Triplet Networks.

ACM BCB. 2019 Sep;2019:466-474. doi: 10.1145/3307339.3342170.

Toward learning robust contrastive embeddings for binaural sound source localization.

Front Neuroinform. 2022 Nov 16;16:942978. doi: 10.3389/fninf.2022.942978. eCollection 2022.

Comparing Class-Aware and Pairwise Loss Functions for Deep Metric Learning in Wildlife Re-Identification.

Sensors (Basel). 2021 Sep 12;21(18):6109. doi: 10.3390/s21186109.

Unsupervised Event Graph Representation and Similarity Learning on Biomedical Literature.

Sensors (Basel). 2021 Dec 21;22(1):3. doi: 10.3390/s22010003.

Global birdsong embeddings enable superior transfer learning for bioacoustic classification.

Sci Rep. 2023 Dec 18;13(1):22876. doi: 10.1038/s41598-023-49989-z.

Neural Decoding of Bistable Sounds Reveals an Effect of Intention on Perceptual Organization.

J Neurosci. 2018 Mar 14;38(11):2844-2853. doi: 10.1523/JNEUROSCI.3022-17.2018. Epub 2018 Feb 13.

Cortical voice processing is grounded in elementary sound analyses for vocalization relevant sound patterns.

Prog Neurobiol. 2021 May;200:101982. doi: 10.1016/j.pneurobio.2020.101982. Epub 2020 Dec 15.

Triplet Deep Hashing with Joint Supervised Loss Based on Deep Neural Networks.

Comput Intell Neurosci. 2019 Oct 9;2019:8490364. doi: 10.1155/2019/8490364. eCollection 2019.

Learning image features with fewer labels using a semi-supervised deep convolutional network.

Neural Netw. 2020 Dec;132:131-143. doi: 10.1016/j.neunet.2020.08.016. Epub 2020 Aug 25.

引用本文的文献

MosquitoSong+: A noise-robust deep learning model for mosquito classification from wingbeat sounds.

PLoS One. 2024 Oct 30;19(10):e0310121. doi: 10.1371/journal.pone.0310121. eCollection 2024.

Bird song comparison using deep learning trained from avian perceptual judgments.

PLoS Comput Biol. 2024 Aug 7;20(8):e1012329. doi: 10.1371/journal.pcbi.1012329. eCollection 2024 Aug.

A Review of Automated Bioacoustics and General Acoustics Classification Research.

Sensors (Basel). 2022 Oct 31;22(21):8361. doi: 10.3390/s22218361.

Recent Advances at the Interface of Neuroscience and Artificial Neural Networks.

J Neurosci. 2022 Nov 9;42(45):8514-8523. doi: 10.1523/JNEUROSCI.1503-22.2022.

Computational bioacoustics with deep learning: a review and roadmap.

PeerJ. 2022 Mar 21;10:e13152. doi: 10.7717/peerj.13152. eCollection 2022.

Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions.

Front Behav Neurosci. 2021 Dec 20;15:811737. doi: 10.3389/fnbeh.2021.811737. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

未标记动物声音事件的深度感知嵌入。

Deep perceptual embeddings for unlabelled animal sound events.

机构信息

Machine Listening Lab, Centre for Digital Music (C4DM), Department of Electronic Engineering, Queen Mary University of London, London, United Kingdom.

Department of Psychology, Royal Holloway University of London, London, United Kingdom.

出版信息

J Acoust Soc Am. 2021 Jul;150(1):2. doi: 10.1121/10.0005475.

DOI:10.1121/10.0005475

PMID:34340499

Abstract

摘要

未标记动物声音事件的深度感知嵌入。

Deep perceptual embeddings for unlabelled animal sound events.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

未标记动物声音事件的深度感知嵌入。

Deep perceptual embeddings for unlabelled animal sound events.

机构信息

出版信息

相似文献

引用本文的文献