Department of Electrical Engineering, University of Engineering and Technology, Peshawar, Pakistan.
Intelligent Information Processing Lab, National Center of Artificial Intelligence, University of Engineering and Technology, Peshawar, Pakistan.
PLoS One. 2024 Mar 28;19(3):e0300444. doi: 10.1371/journal.pone.0300444. eCollection 2024.
This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called 'chunks' are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.
本文提出了一种新颖的用于开放环境中罕见事件检测(SED)的系统。小波多分辨率分析(MRA)用于将 30 秒的输入音频剪辑分解为五个级别。然后,在 MRA 的第三级和第五级应用小波去噪来过滤背景。通过将峰值查找算法与 K-均值聚类算法相结合,在这两个级别中估计可能表示罕见事件开始的显著转换。然后,从输入音频信号中对应于估计的显著转换位置的一小部分,持续时间为 1 秒的部分,称为“块”。从这些块中提取特征,由小波散射网络(WSN)提取,并作为输入提供给支持向量机(SVM)分类器,对其进行分类。所提出的 SED 框架产生的错误率可与基于卷积神经网络(CNN)架构的 SED 系统相媲美。此外,与深度学习模型相比,所提出的算法在计算上更有效且更轻量级,因为它没有可学习的参数。它只需要一个训练的单个时期,这比基于 CNN 和深度神经网络(DNN)、具有长短期记忆(LSTM)网络的 CNN、卷积递归神经网络(CRNN)和 CNN 的模型分别少 5、10、200 和 600 倍。所提出的模型既不需要为异常检测与前一帧拼接,也不需要为其他比较的深度学习模型创建额外的训练数据。与本文用于比较的其他基线系统相比,它需要检查罕见事件存在的块数几乎少 360 倍。所有这些特性使得所提出的系统适合在资源有限的设备上进行实时应用。