Ijaz Nouman, Banoori Farhad, Koo Insoo
Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea.
School of Electronics and Information Engineering, South China University of Technology, Guangzhou 510641, China.
Bioengineering (Basel). 2024 Jul 5;11(7):685. doi: 10.3390/bioengineering11070685.
Bioacoustic event detection is a demanding endeavor involving recognizing and classifying the sounds animals make in their natural habitats. Traditional supervised learning requires a large amount of labeled data, which are hard to come by in bioacoustics. This paper presents a few-shot learning (FSL) method incorporating transductive inference and data augmentation to address the issues of too few labeled events and small volumes of recordings. Here, transductive inference iteratively alters class prototypes and feature extractors to seize essential patterns, whereas data augmentation applies SpecAugment on Mel spectrogram features to augment training data. The proposed approach is evaluated by using the Detecting and Classifying Acoustic Scenes and Events (DCASE) 2022 and 2021 datasets. Extensive experimental results demonstrate that all components of the proposed method achieve significant F-score improvements of 27% and 10%, for the DCASE-2022 and DCASE-2021 datasets, respectively, compared to recent advanced approaches. Moreover, our method is helpful in FSL tasks because it effectively adapts to sounds from various animal species, recordings, and durations.
生物声学事件检测是一项具有挑战性的工作,涉及识别和分类动物在其自然栖息地发出的声音。传统的监督学习需要大量的标记数据,而在生物声学中很难获得这些数据。本文提出了一种结合转导推理和数据增强的少样本学习(FSL)方法,以解决标记事件过少和录音数量少的问题。在这里,转导推理迭代地改变类原型和特征提取器以捕捉基本模式,而数据增强则对梅尔频谱图特征应用SpecAugment来增强训练数据。通过使用检测与分类声学场景和事件(DCASE)2022和2021数据集对所提出的方法进行评估。大量实验结果表明,与最近的先进方法相比,所提出方法的所有组件在DCASE - 2022和DCASE - 2021数据集上分别实现了显著的F分数提升,提升幅度分别为27%和10%。此外,我们的方法在少样本学习任务中很有帮助,因为它能有效适应来自各种动物物种、录音和时长的声音。