Suppr超能文献

基于局部压缩凸光谱嵌入的鸟类物种识别方法

Local compressed convex spectral embedding for bird species identification.

机构信息

School of Computing and Electrical Engineering, IIT Mandi, Mandi, Himachal Pradesh-175005, India.

出版信息

J Acoust Soc Am. 2018 Jun;143(6):3819. doi: 10.1121/1.5042241.

Abstract

This paper proposes a multi-layer alternating sparse-dense framework for bird species identification. The framework takes audio recordings of bird vocalizations and produces compressed convex spectral embeddings (CCSE). Temporal and frequency modulations in bird vocalizations are ensnared by concatenating frames of the spectrogram, resulting in a high dimensional and highly sparse super-frame-based representation. Random projections are then used to compress these super-frames. Class-specific archetypal analysis is employed on the compressed super-frames for acoustic modeling, obtaining the convex-sparse CCSE representation. This representation efficiently captures species-specific discriminative information. However, many bird species exhibit high intra-species variations in their vocalizations, making it hard to appropriately model the whole repertoire of vocalizations using only one dictionary of archetypes. To overcome this, each class is clustered using Gaussian mixture models (GMM), and for each cluster, one dictionary of archetypes is learned. To calculate CCSE for any compressed super-frame, one dictionary from each class is chosen using the responsibilities of individual GMM components. The CCSE obtained using this GMM-archetypal analysis framework is referred to as local CCSE. Experimental results corroborate that local CCSE either outperforms or exhibits comparable performances to existing methods including support vector machine powered by dynamic kernels and deep neural networks.

摘要

本文提出了一种用于鸟类物种识别的多层交替稀疏-密集框架。该框架采用鸟类发声的音频记录,并生成压缩的凸谱嵌入(CCSE)。通过将声谱图的帧串联起来,捕获鸟类发声中的时频调制,从而产生高维且高度稀疏的基于超帧的表示。然后使用随机投影来压缩这些超帧。在压缩的超帧上进行特定于类别的原型分析,用于声学建模,获得凸稀疏 CCSE 表示。该表示有效地捕获了物种特异性的判别信息。然而,许多鸟类物种在其发声中表现出高度的种内变异,因此仅使用一个原型字典很难适当地对整个发声曲目进行建模。为了克服这个问题,使用高斯混合模型(GMM)对每个类进行聚类,并为每个聚类学习一个原型字典。为了计算任何压缩超帧的 CCSE,使用各个 GMM 分量的责任从每个类中选择一个字典。使用这种 GMM-原型分析框架获得的 CCSE 称为局部 CCSE。实验结果证实,局部 CCSE 要么优于,要么与包括基于动态核的支持向量机和深度神经网络在内的现有方法表现相当。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验