基于深度可分离卷积神经网络和数据增强的海豚声信号自动分类。

Automated classification of tursiops aduncus whistles based on a depth-wise separable convolutional neural network and data augmentation.

机构信息

Acoustic Science and Technology Laboratory, Harbin Engineering University, Harbin 150001, China.

Third Institute of Oceanography, Ministry of Natural Resources, Xiamen 361000, China.

出版信息

J Acoust Soc Am. 2021 Nov;150(5):3861. doi: 10.1121/10.0007291.

DOI:10.1121/10.0007291

PMID:34852567

Abstract

Whistle classification plays an essential role in studying the habitat and social behaviours of cetaceans. We obtained six categories of sweep whistles of two Tursiops aduncus individual signals using the passive acoustic mornitoring technique over a period of eight months in the Xiamen area. First, we propose a depthwise separable convolutional neural network for whistle classification. The proposed model adopts the depthwise convolution combined with the followed point-by-point convolution instead of the conventional convolution. As a result, it brings a better classification performance in sample sets with relatively independent features between different channels. Meanwhile, it leads to less computational complexity and fewer model parameters. Second, in order to solve the problem of an imbalance in the number of samples under each whistle category, we propose a random series method with five audio augmentation algorithms. The generalization ability of the trained model was improved by using an opening probability for each algorithm and the random selection of each augmentation factor within specific ranges. Finally, we explore the effect of the proposed augmentation method on the performance of our proposed architecture and find that it enhances the accuracy up to 98.53% for the classification of Tursiops aduncus whistles.

摘要

啸声分类在研究鲸目动物的栖息地和社会行为方面起着至关重要的作用。我们使用被动声学监测技术，在厦门地区进行了为期八个月的监测，获得了两只宽吻海豚个体信号的六类扫频啸声。首先，我们提出了一种用于啸声分类的深度可分离卷积神经网络。所提出的模型采用深度卷积结合随后的逐点卷积，而不是传统的卷积。因此，在不同通道之间具有相对独立特征的样本集中，它带来了更好的分类性能。同时，它导致更少的计算复杂度和更少的模型参数。其次，为了解决每个啸声类别样本数量不平衡的问题，我们提出了一种带有五种音频增强算法的随机序列方法。通过为每个算法设置一个开放概率，并在特定范围内随机选择每个增强因子，提高了训练模型的泛化能力。最后，我们探讨了所提出的增强方法对所提出的架构性能的影响，发现它将分类宽吻海豚啸声的准确率提高到了 98.53%。