Suppr超能文献

基于开源深度学习架构的原始声音波形的鸟类叫声的生物声学分类。

Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture.

机构信息

School of Engineering and Technology, Central Queensland University, North Rockhampton, QLD, Australia.

School of Health, Medical and Applied Sciences, Flora, Fauna and Freshwater Research, Central Queensland University, Townsville, QLD, Australia.

出版信息

Sci Rep. 2021 Aug 3;11(1):15733. doi: 10.1038/s41598-021-95076-6.

Abstract

The use of autonomous recordings of animal sounds to detect species is a popular conservation tool, constantly improving in fidelity as audio hardware and software evolves. Current classification algorithms utilise sound features extracted from the recording rather than the sound itself, with varying degrees of success. Neural networks that learn directly from the raw sound waveforms have been implemented in human speech recognition but the requirements of detailed labelled data have limited their use in bioacoustics. Here we test SincNet, an efficient neural network architecture that learns from the raw waveform using sinc-based filters. Results using an off-the-shelf implementation of SincNet on a publicly available bird sound dataset (NIPS4Bplus) show that the neural network rapidly converged reaching accuracies of over 65% with limited data. Their performance is comparable with traditional methods after hyperparameter tuning but they are more efficient. Learning directly from the raw waveform allows the algorithm to select automatically those elements of the sound that are best suited for the task, bypassing the onerous task of selecting feature extraction techniques and reducing possible biases. We use publicly released code and datasets to encourage others to replicate our results and to apply SincNet to their own datasets; and we review possible enhancements in the hope that algorithms that learn from the raw waveform will become useful bioacoustic tools.

摘要

使用动物声音的自主录音来检测物种是一种流行的保护工具,随着音频硬件和软件的不断发展,其保真度也在不断提高。目前的分类算法利用从录音中提取的声音特征,而不是声音本身,其成功率也各不相同。直接从原始声波学习的神经网络已经在人类语音识别中得到了应用,但详细标记数据的要求限制了它们在生物声学中的应用。在这里,我们测试了 SincNet,这是一种使用基于 sinc 的滤波器从原始波形中学习的高效神经网络架构。使用现成的 SincNet 在一个公开的鸟类声音数据集(NIPS4Bplus)上的实现结果表明,神经网络在使用有限的数据时迅速收敛,达到了超过 65%的准确率。在经过超参数调整后,它们的性能与传统方法相当,但效率更高。直接从原始波形学习可以让算法自动选择最适合任务的声音元素,从而避免了选择特征提取技术的繁重任务,并减少了可能的偏差。我们使用公开发布的代码和数据集来鼓励其他人复制我们的结果,并将 SincNet 应用于他们自己的数据集;我们还回顾了可能的改进,希望从原始波形学习的算法将成为有用的生物声学工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59bf/8333097/9866b50248d3/41598_2021_95076_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验