基于卷积递归神经网络的多麦克风隧道事件检测

Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones.

作者信息

Kim Nam Kyun, Jeon Kwang Myung, Kim Hong Kook

机构信息

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea.

出版信息

Sensors (Basel). 2019 Jun 14;19(12):2695. doi: 10.3390/s19122695.

This paper proposes a sound event detection (SED) method in tunnels to prevent further uncontrollable accidents. Tunnel accidents are accompanied by crashes and tire skids, which usually produce abnormal sounds. Since the tunnel environment always has a severe level of noise, the detection accuracy can be greatly reduced in the existing methods. To deal with the noise issue in the tunnel environment, the proposed method involves the preprocessing of tunnel acoustic signals and a classifier for detecting acoustic events in tunnels. For preprocessing, a non-negative tensor factorization (NTF) technique is used to separate the acoustic event signal from the noisy signal in the tunnel. In particular, the NTF technique developed in this paper consists of source separation and online noise learning. In other words, the noise basis is adapted by an online noise learning technique for enhancement in adverse noise conditions. Next, a convolutional recurrent neural network (CRNN) is extended to accommodate the contributions of the separated event signal and noise to the event detection; thus, the proposed CRNN is composed of event convolution layers and noise convolution layers in parallel followed by recurrent layers and the output layer. Here, a set of mel-filterbank feature parameters is used as the input features. Evaluations of the proposed method are conducted on two datasets: a publicly available road audio events dataset and a tunnel audio dataset recorded in a real traffic tunnel for six months. In the first evaluation where the background noise is low, the proposed CRNN-based SED method with online noise learning reduces the relative recognition error rate by 56.25% when compared to the conventional CRNN-based method with noise. In the second evaluation, where the tunnel background noise is more severe than in the first evaluation, the proposed CRNN-based SED method yields superior performance when compared to the conventional methods. In particular, it is shown that among all of the compared methods, the proposed method with the online noise learning provides the best recognition rate of 91.07% and reduces the recognition error rates by 47.40% and 28.56% when compared to the Gaussian mixture model (GMM)-hidden Markov model (HMM)-based and conventional CRNN-based SED methods, respectively. The computational complexity measurements also show that the proposed CRNN-based SED method requires a processing time of 599 ms for both the NTF-based source separation with online noise learning and CRNN classification when the tunnel noisy signal is one second long, which implies that the proposed method detects events in real-time.

本文提出了一种用于隧道的声音事件检测（SED）方法，以防止进一步发生无法控制的事故。隧道事故伴随着碰撞和轮胎打滑，通常会产生异常声音。由于隧道环境噪声水平始终很高，现有方法的检测精度可能会大大降低。为了解决隧道环境中的噪声问题，该方法包括隧道声学信号的预处理和用于检测隧道中声学事件的分类器。对于预处理，使用非负张量分解（NTF）技术将声学事件信号与隧道中的噪声信号分离。特别是，本文开发的NTF技术包括源分离和在线噪声学习。换句话说，通过在线噪声学习技术调整噪声基，以在不利噪声条件下进行增强。接下来，扩展卷积循环神经网络（CRNN）以适应分离出的事件信号和噪声对事件检测的贡献；因此，所提出的CRNN由并行的事件卷积层和噪声卷积层组成，随后是循环层和输出层。这里，一组梅尔滤波器组特征参数用作输入特征。在所提出的方法在两个数据集上进行了评估：一个公开可用的道路音频事件数据集和一个在真实交通隧道中记录了六个月的隧道音频数据集。在第一次评估中，背景噪声较低，与传统的基于CRNN且带噪声的方法相比，所提出的基于CRNN且带在线噪声学习的SED方法将相对识别错误率降低了56.25%。在第二次评估中，隧道背景噪声比第一次评估中更严重，与传统方法相比，所提出的基于CRNN的SED方法表现出卓越的性能。特别是，结果表明，在所有比较的方法中，所提出的带在线噪声学习的方法提供了91.07%的最佳识别率，与基于高斯混合模型（GMM）-隐马尔可夫模型（HMM）的SED方法和传统的基于CRNN的SED方法相比，分别将识别错误率降低了47.40%和28.56%。计算复杂度测量还表明，当隧道噪声信号为1秒长时，所提出的基于CRNN的SED方法对于基于NTF的带在线噪声学习的源分离和CRNN分类都需要599毫秒的处理时间，这意味着所提出的方法能够实时检测事件。