Suppr超能文献

用于从个体叫声中识别非洲狮的卷积神经网络

Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations.

作者信息

Trapanotto Martino, Nanni Loris, Brahnam Sheryl, Guo Xiang

机构信息

Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy.

Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA.

出版信息

J Imaging. 2022 Apr 1;8(4):96. doi: 10.3390/jimaging8040096.

Abstract

The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.

摘要

用于被动声学监测(PAM)和动物数量统计的声音个体分类正成为一个越来越热门的研究领域。在这个研究领域中,几乎所有的研究都依赖于经典的音频表示和分类器,比如在频谱图或梅尔频率倒谱系数(MFCC)上训练的支持向量机(SVM)。相比之下,当前大多数生物声学物种分类利用了深度学习和更前沿的音频表示的力量。在声音身份分类中避免使用深度学习的一个重要原因是标记的个体声音集合中的样本量极小。众所周知,深度学习需要大型数据集来避免过拟合。使用深度学习方法处理小数据集的一种方法是使用迁移学习。在这项工作中,我们在一个公开可用的小型狮子吼声数据集上评估了三个预训练的卷积神经网络(VGG16、ResNet50和AlexNet)的性能,该数据集包含从五只雄狮身上采集的大约150个样本。这些网络中的每一个都在样本的八种表示上进行重新训练:MFCC、频谱图和梅尔频谱图,以及几种新的表示,如VGGish和Stockwell,还有基于最近提出的LM频谱图的表示。使用等错误率对这些网络单独和整体的性能进行了分析和验证,结果表明其性能超过了此前在该数据集上的分类尝试;最佳的单个网络准确率超过95%,最佳的整体准确率超过98%。本研究对个体声音分类领域的贡献包括证明了在这个问题领域可用的小数据集上谨慎地使用单个预训练卷积神经网络进行迁移学习是有价值且可行的。我们还通过比较许多最先进的音频表示的性能,为生物声学领域做出了贡献,其中首次包括了LM频谱图和Stockwell表示。本研究的所有源代码都可在GitHub上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/9029749/a3c64699e234/jimaging-08-00096-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验