Ludeña-Choez Jimmy, Quispe-Soncco Raisa, Gallardo-Antolín Ascensión
Facultad de Ingeniería y Computación, Centro de Investigación en Electrónica y Telecomunicaciones (CIET), Grupo de Investigación en Ciencia y Tecnología de Materiales (CITEM), Universidad Católica San Pablo, Arequipa, Perú.
Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Madrid, Spain.
PLoS One. 2017 Jun 19;12(6):e0179403. doi: 10.1371/journal.pone.0179403. eCollection 2017.
Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorporates this specific information through the non-negative decomposition of bird sound spectrograms. It consists of the following two different stages: short-time feature extraction and temporal feature integration. In the first stage, which aims at providing a better spectral representation of bird sounds on a frame-by-frame basis, two methods are evaluated. In the first method, cepstral-like features (NMF_CC) are extracted by using a filter bank that is automatically learned by means of the application of Non-Negative Matrix Factorization (NMF) on bird audio spectrograms. In the second method, the features are directly derived from the activation coefficients of the spectrogram decomposition as performed through NMF (H_CC). The second stage summarizes the most relevant information contained in the short-time features by computing several statistical measures over long segments. The experiments show that the use of NMF_CC and H_CC in conjunction with temporal integration significantly improves the performance of a Support Vector Machine (SVM)-based ABSC system with respect to conventional MFCC.
用于声学鸟类物种分类(ABSC)任务的特征提取传统上基于专门为语音信号开发的参数表示,例如梅尔频率倒谱系数(MFCC)。然而,通过考虑鸟类的发声机制,特别是鸟鸣声的频谱 - 时间结构,可以增强这些特征对ABSC的辨别能力。在本文中,提出了一种用于ABSC的新前端,它通过鸟鸣声谱图的非负分解纳入了这些特定信息。它由以下两个不同阶段组成:短时特征提取和时间特征整合。在第一阶段,旨在逐帧提供鸟鸣声更好的频谱表示,评估了两种方法。在第一种方法中,通过对鸟类音频谱图应用非负矩阵分解(NMF)自动学习的滤波器组来提取类倒谱特征(NMF_CC)。在第二种方法中,特征直接从通过NMF执行的谱图分解的激活系数中导出(H_CC)。第二阶段通过计算长段上的几个统计量来总结短时特征中包含的最相关信息。实验表明,将NMF_CC和H_CC与时间整合结合使用,相对于传统的MFCC,显著提高了基于支持向量机(SVM)的ABSC系统的性能。