深度学习在犬类语音分析中的应用：用于生物声学研究的全自动语音分析系统的开发。

Voice Analysis in Dogs with Deep Learning: Development of a Fully Automatic Voice Analysis System for Bioacoustics Studies.

作者信息

Karaaslan Mahmut, Turkoglu Bahaeddin, Kaya Ersin, Asuroglu Tunc

机构信息

Department of Computer Engineering, Konya Technical University, 42250 Konya, Turkey.

Department of Artificial Intelligence and Data Engineering, Ankara University, 06830 Ankara, Turkey.

出版信息

Sensors (Basel). 2024 Dec 13;24(24):7978. doi: 10.3390/s24247978.

DOI:10.3390/s24247978

PMID:39771714

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11680081/

Abstract

Extracting behavioral information from animal sounds has long been a focus of research in bioacoustics, as sound-derived data are crucial for understanding animal behavior and environmental interactions. Traditional methods, which involve manual review of extensive recordings, pose significant challenges. This study proposes an automated system for detecting and classifying animal vocalizations, enhancing efficiency in behavior analysis. The system uses a preprocessing step to segment relevant sound regions from audio recordings, followed by feature extraction using Short-Time Fourier Transform (STFT), Mel-frequency cepstral coefficients (MFCCs), and linear-frequency cepstral coefficients (LFCCs). These features are input into convolutional neural network (CNN) classifiers to evaluate performance. Experimental results demonstrate the effectiveness of different CNN models and feature extraction methods, with AlexNet, DenseNet, EfficientNet, ResNet50, and ResNet152 being evaluated. The system achieves high accuracy in classifying vocal behaviors, such as barking and howling in dogs, providing a robust tool for behavioral analysis. The study highlights the importance of automated systems in bioacoustics research and suggests future improvements using deep learning-based methods for enhanced classification performance.

摘要

从动物声音中提取行为信息长期以来一直是生物声学研究的重点，因为源自声音的数据对于理解动物行为和环境相互作用至关重要。传统方法需要人工查看大量录音，面临重大挑战。本研究提出了一种用于检测和分类动物叫声的自动化系统，提高行为分析的效率。该系统使用预处理步骤从音频记录中分割出相关声音区域，然后使用短时傅里叶变换（STFT）、梅尔频率倒谱系数（MFCC）和线性频率倒谱系数（LFCC）进行特征提取。这些特征被输入到卷积神经网络（CNN）分类器中以评估性能。实验结果证明了不同CNN模型和特征提取方法的有效性，对AlexNet、DenseNet、EfficientNet、ResNet50和ResNet152进行了评估。该系统在对犬类的吠叫和嚎叫等发声行为进行分类时具有很高的准确率，为行为分析提供了一个强大的工具。该研究突出了自动化系统在生物声学研究中的重要性，并建议未来使用基于深度学习的方法进行改进以提高分类性能。