EECS Department, Florida Atlantic University, Boca Raton, FL 33431, USA.
ECE Department, Georgia Tech, Atlanta, GA 30332, USA.
Sensors (Basel). 2021 Nov 3;21(21):7320. doi: 10.3390/s21217320.
Distinguishing between a dangerous audio event like a gun firing and other non-life-threatening events, such as a plastic bag bursting, can mean the difference between life and death and, therefore, the necessary and unnecessary deployment of public safety personnel. Sounds generated by plastic bag explosions are often confused with real gunshot sounds, by either humans or computer algorithms. As a case study, the research reported in this paper offers insight into sounds of plastic bag explosions and gunshots. An experimental study in this research reveals that a deep learning-based classification model trained with a popular urban sound dataset containing gunshot sounds cannot distinguish plastic bag pop sounds from gunshot sounds. This study further shows that the same deep learning model, if trained with a dataset containing plastic pop sounds, can effectively detect the non-life-threatening sounds. For this purpose, first, a collection of plastic bag-popping sounds was recorded in different environments with varying parameters, such as plastic bag size and distance from the recording microphones. The audio clips' duration ranged from 400 ms to 600 ms. This collection of data was then used, together with a gunshot sound dataset, to train a classification model based on a convolutional neural network (CNN) to differentiate life-threatening gunshot events from non-life-threatening plastic bag explosion events. A comparison between two feature extraction methods, the Mel-frequency cepstral coefficients (MFCC) and Mel-spectrograms, was also done. Experimental studies conducted in this research show that once the plastic bag pop sounds are injected into model training, the CNN classification model performs well in distinguishing actual gunshot sounds from plastic bag sounds.
区分危险的音频事件,如枪声和其他非危及生命的事件,如塑料袋爆裂,可以决定公共安全人员的必要和不必要部署。塑料袋爆炸产生的声音经常被人类或计算机算法误认为是真实的枪声。作为一个案例研究,本文所报告的研究提供了对塑料袋爆炸和枪声的深入了解。本研究中的一项实验研究表明,使用包含枪声的流行城市声音数据集训练的基于深度学习的分类模型无法区分塑料袋爆裂声和枪声。这项研究进一步表明,如果使用包含塑料袋爆裂声的数据集来训练相同的深度学习模型,就可以有效地检测到非危及生命的声音。为此,首先在不同的环境中记录了不同尺寸塑料袋的爆裂声,这些环境的参数有所不同,例如塑料袋的尺寸和与录音麦克风的距离。音频剪辑的持续时间从 400 毫秒到 600 毫秒不等。然后,将此数据集与枪声数据集一起用于训练基于卷积神经网络(CNN)的分类模型,以区分危及生命的枪击事件和非危及生命的塑料袋爆炸事件。还比较了两种特征提取方法,梅尔频率倒谱系数(MFCC)和梅尔频谱图。本研究中的实验研究表明,一旦将塑料袋爆裂声注入模型训练中,CNN 分类模型就能很好地区分实际枪声和塑料袋声。