基于开源深度学习架构的原始声音波形的鸟类叫声的生物声学分类。

Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture.

机构信息

School of Engineering and Technology, Central Queensland University, North Rockhampton, QLD, Australia.

School of Health, Medical and Applied Sciences, Flora, Fauna and Freshwater Research, Central Queensland University, Townsville, QLD, Australia.

出版信息

Sci Rep. 2021 Aug 3;11(1):15733. doi: 10.1038/s41598-021-95076-6.

DOI:10.1038/s41598-021-95076-6

PMID:34344970

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8333097/

Abstract

The use of autonomous recordings of animal sounds to detect species is a popular conservation tool, constantly improving in fidelity as audio hardware and software evolves. Current classification algorithms utilise sound features extracted from the recording rather than the sound itself, with varying degrees of success. Neural networks that learn directly from the raw sound waveforms have been implemented in human speech recognition but the requirements of detailed labelled data have limited their use in bioacoustics. Here we test SincNet, an efficient neural network architecture that learns from the raw waveform using sinc-based filters. Results using an off-the-shelf implementation of SincNet on a publicly available bird sound dataset (NIPS4Bplus) show that the neural network rapidly converged reaching accuracies of over 65% with limited data. Their performance is comparable with traditional methods after hyperparameter tuning but they are more efficient. Learning directly from the raw waveform allows the algorithm to select automatically those elements of the sound that are best suited for the task, bypassing the onerous task of selecting feature extraction techniques and reducing possible biases. We use publicly released code and datasets to encourage others to replicate our results and to apply SincNet to their own datasets; and we review possible enhancements in the hope that algorithms that learn from the raw waveform will become useful bioacoustic tools.

摘要

使用动物声音的自主录音来检测物种是一种流行的保护工具，随着音频硬件和软件的不断发展，其保真度也在不断提高。目前的分类算法利用从录音中提取的声音特征，而不是声音本身，其成功率也各不相同。直接从原始声波学习的神经网络已经在人类语音识别中得到了应用，但详细标记数据的要求限制了它们在生物声学中的应用。在这里，我们测试了 SincNet，这是一种使用基于 sinc 的滤波器从原始波形中学习的高效神经网络架构。使用现成的 SincNet 在一个公开的鸟类声音数据集（NIPS4Bplus）上的实现结果表明，神经网络在使用有限的数据时迅速收敛，达到了超过 65%的准确率。在经过超参数调整后，它们的性能与传统方法相当，但效率更高。直接从原始波形学习可以让算法自动选择最适合任务的声音元素，从而避免了选择特征提取技术的繁重任务，并减少了可能的偏差。我们使用公开发布的代码和数据集来鼓励其他人复制我们的结果，并将 SincNet 应用于他们自己的数据集；我们还回顾了可能的改进，希望从原始波形学习的算法将成为有用的生物声学工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59bf/8333097/9866b50248d3/41598_2021_95076_Fig1_HTML.jpg

相似文献

Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture.

Sci Rep. 2021 Aug 3;11(1):15733. doi: 10.1038/s41598-021-95076-6.

BioCPPNet: automatic bioacoustic source separation with deep neural networks.

Sci Rep. 2021 Dec 6;11(1):23502. doi: 10.1038/s41598-021-02790-2.

Robust sound event detection in bioacoustic sensor networks.

PLoS One. 2019 Oct 24;14(10):e0214168. doi: 10.1371/journal.pone.0214168. eCollection 2019.

PROTAX-Sound: A probabilistic framework for automated animal sound identification.

PLoS One. 2017 Sep 1;12(9):e0184048. doi: 10.1371/journal.pone.0184048. eCollection 2017.

Exploiting deep neural network and long short-term memory method-ologies in bioacoustic classification of LPC-based features.

PLoS One. 2021 Dec 23;16(12):e0259140. doi: 10.1371/journal.pone.0259140. eCollection 2021.

Using SincNet for Learning Pathological Voice Disorders.

Sensors (Basel). 2022 Sep 2;22(17):6634. doi: 10.3390/s22176634.

An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding.

Neuroimage Clin. 2023;39:103482. doi: 10.1016/j.nicl.2023.103482. Epub 2023 Jul 28.

ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning.

Sci Rep. 2019 Jul 29;9(1):10997. doi: 10.1038/s41598-019-47335-w.

5G AI-IoT System for Bird Species Monitoring and Song Classification.

Sensors (Basel). 2024 Jun 6;24(11):3687. doi: 10.3390/s24113687.

Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity.

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:412-415. doi: 10.1109/EMBC46164.2021.9630427.

引用本文的文献

AI-Powered Vocalization Analysis in Poultry: Systematic Review of Health, Behavior, and Welfare Monitoring.

Sensors (Basel). 2025 Jun 29;25(13):4058. doi: 10.3390/s25134058.

Applications and Considerations of Artificial Intelligence in Veterinary Sciences: A Narrative Review.

Vet Med Sci. 2025 May;11(3):e70315. doi: 10.1002/vms3.70315.

Elephant Sound Classification Using Deep Learning Optimization.

Sensors (Basel). 2025 Jan 9;25(2):352. doi: 10.3390/s25020352.

Modelling reindeer rut activity using on-animal acoustic recorders and machine learning.

Ecol Evol. 2024 Jun 25;14(6):e11479. doi: 10.1002/ece3.11479. eCollection 2024 Jun.

callsync: An R package for alignment and analysis of multi-microphone animal recordings.

Ecol Evol. 2024 May 23;14(5):e11384. doi: 10.1002/ece3.11384. eCollection 2024 May.

From beasts to bytes: Revolutionizing zoological research with artificial intelligence.

Zool Res. 2023 Nov 18;44(6):1115-1131. doi: 10.24272/j.issn.2095-8137.2023.263.

Who is calling? Optimizing source identification from marmoset vocalizations with hierarchical machine learning classifiers.

J R Soc Interface. 2023 Oct;20(207):20230399. doi: 10.1098/rsif.2023.0399. Epub 2023 Oct 18.

A Review of Automated Bioacoustics and General Acoustics Classification Research.

Sensors (Basel). 2022 Oct 31;22(21):8361. doi: 10.3390/s22218361.

Improving Misfire Fault Diagnosis with Cascading Architectures via Acoustic Vehicle Characterization.

Sensors (Basel). 2022 Oct 12;22(20):7736. doi: 10.3390/s22207736.

A ResNet attention model for classifying mosquitoes from wing-beating sounds.

Sci Rep. 2022 Jun 20;12(1):10334. doi: 10.1038/s41598-022-14372-x.

本文引用的文献

NIPS4Bplus: a richly annotated birdsong audio dataset.

PeerJ Comput Sci. 2019 Oct 7;5:e223. doi: 10.7717/peerj-cs.223. eCollection 2019.

Assemblage of Focal Species Recognizers-AFSR: A technique for decreasing false indications of presence from acoustic automatic identification in a multiple species context.

PLoS One. 2019 Dec 5;14(12):e0212727. doi: 10.1371/journal.pone.0212727. eCollection 2019.

A deafening silence: a lack of data and reproducibility in published bioacoustics research?

Biodivers Data J. 2019 Oct 30;7:e36783. doi: 10.3897/BDJ.7.e36783. eCollection 2019.

Deep Machine Learning Techniques for the Detection and Classification of Sperm Whale Bioacoustics.

Sci Rep. 2019 Aug 29;9(1):12588. doi: 10.1038/s41598-019-48909-4.

Autonomous sound recording outperforms human observation for sampling birds: a systematic map and user guide.

Ecol Appl. 2019 Sep;29(6):e01954. doi: 10.1002/eap.1954. Epub 2019 Jul 22.

Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species.

PLoS One. 2017 Jun 19;12(6):e0179403. doi: 10.1371/journal.pone.0179403. eCollection 2017.

Automated Sound Recognition Provides Insights into the Behavioral Ecology of a Tropical Bird.

PLoS One. 2017 Jan 13;12(1):e0169041. doi: 10.1371/journal.pone.0169041. eCollection 2017.

Birdsong Denoising Using Wavelets.

PLoS One. 2016 Jan 26;11(1):e0146790. doi: 10.1371/journal.pone.0146790. eCollection 2016.

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning.

PeerJ. 2014 Jul 17;2:e488. doi: 10.7717/peerj.488. eCollection 2014.

Real-time bioacoustics monitoring and automated species identification.

PeerJ. 2013 Jul 16;1:e103. doi: 10.7717/peerj.103. Print 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于开源深度学习架构的原始声音波形的鸟类叫声的生物声学分类。

Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献