用于海豚发声自动分类的多类卷积神经网络方法。

Multiclass CNN Approach for Automatic Classification of Dolphin Vocalizations.

作者信息

Di Nardo Francesco, De Marco Rocco, Li Veli Daniel, Screpanti Laura, Castagna Benedetta, Lucchetti Alessandro, Scaradozzi David

机构信息

Dipartimento di Ingegneria dell'informazione, Università Politecnica delle Marche, 60131 Ancona, Italy.

Institute of Biological Resources and Marine Biotechnology (IRBIM), National Research Council (CNR), 60125 Ancona, Italy.

出版信息

Sensors (Basel). 2025 Apr 16;25(8):2499. doi: 10.3390/s25082499.

DOI:10.3390/s25082499

PMID:40285189

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12031246/

Abstract

Monitoring dolphins in the open sea is essential for understanding their behavior and the impact of human activities on the marine ecosystems. Passive Acoustic Monitoring (PAM) is a non-invasive technique for tracking dolphins, providing continuous data. This study presents a novel approach for classifying dolphin vocalizations from a PAM acoustic recording using a convolutional neural network (CNN). Four types of common bottlenose dolphin () vocalizations were identified from underwater recordings: whistles, echolocation clicks, burst pulse sounds, and feeding buzzes. To enhance classification performances, edge-detection filters were applied to spectrograms, with the aim of removing unwanted noise components. A dataset of nearly 10,000 spectrograms was used to train and test the CNN through a 10-fold cross-validation procedure. The results showed that the CNN achieved an average accuracy of 95.2% and an F1-score of 87.8%. The class-specific results showed a high accuracy for whistles (97.9%), followed by echolocation clicks (94.5%), feeding buzzes (94.0%), and burst pulse sounds (92.3%). The highest F1-score was obtained for whistles, exceeding 95%, while the other three vocalization typologies maintained an F1-score above 80%. This method provides a promising step toward improving the passive acoustic monitoring of dolphins, contributing to both species conservation and the mitigation of conflicts with fisheries.

摘要

在公海监测海豚对于了解它们的行为以及人类活动对海洋生态系统的影响至关重要。被动声学监测（PAM）是一种用于追踪海豚的非侵入性技术，可提供连续数据。本研究提出了一种使用卷积神经网络（CNN）从PAM声学记录中对海豚叫声进行分类的新方法。从水下记录中识别出了四种常见宽吻海豚的叫声：哨声、回声定位咔哒声、脉冲猝发声和摄食嗡声。为了提高分类性能，将边缘检测滤波器应用于频谱图，以去除不需要的噪声成分。通过10折交叉验证程序，使用一个近10000个频谱图的数据集来训练和测试CNN。结果表明，CNN的平均准确率达到95.2%，F1分数为87.8%。特定类别的结果显示，哨声的准确率很高（97.9%），其次是回声定位咔哒声（94.5%）、摄食嗡声（94.0%）和脉冲猝发声（92.3%）。哨声获得了最高的F1分数，超过95%，而其他三种叫声类型的F1分数保持在80%以上。该方法为改进海豚的被动声学监测迈出了有希望的一步，有助于物种保护和缓解与渔业的冲突。