Machine Listening Lab, Centre for Digital Music (C4DM), Department of Electronic Engineering, Queen Mary University of London, London, United Kingdom.
Department of Psychology, Royal Holloway University of London, London, United Kingdom.
J Acoust Soc Am. 2021 Jul;150(1):2. doi: 10.1121/10.0005475.
Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.
评估声音相似性是听觉感知和计算分析的基本组成部分。传统的基于数据的感知相似性分析基于启发式或简化的线性模型,因此受到限制。深度学习嵌入,通常使用三元网络,在许多领域都很有用。然而,这种网络通常是使用带有类别标签的大型数据集进行训练的。但这些标签并不总是可以获取的。当没有类别标签时,我们探索用于声音事件表示的数据驱动神经嵌入,而是利用感知相似性判断的代理。最终,我们的目标是创建一个反映动物对声音感知的感知嵌入空间。我们使用三元模型为鸟类声音创建深度感知嵌入。为了解决在缺乏类别标签数据的情况下进行三元损失训练的挑战性问题,我们利用多维尺度(MDS)预训练、注意力池化和三元挖掘方案。我们还评估了三元学习相对于从仅基于 MDS 训练的模型学习神经嵌入的优势。使用相似性判断的计算代理,我们展示了该方法的可行性,即基于行为判断为广泛的数据开发感知模型,帮助我们了解动物如何感知声音。