Peng Lujie, Yang Junyu, Yan Longke, Chen Zhiyi, Xiao Jianbiao, Zhou Liang, Zhou Jun
Department of Internet of Things Engineering, School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
Sensors (Basel). 2023 Jul 28;23(15):6767. doi: 10.3390/s23156767.
In recent years, environmental sound classification (ESC) has prevailed in many artificial intelligence Internet of Things (AIoT) applications, as environmental sound contains a wealth of information that can be used to detect particular events. However, existing ESC methods have high computational complexity and are not suitable for deployment on AIoT devices with constrained computing resources. Therefore, it is of great importance to propose a model with both high classification accuracy and low computational complexity. In this work, a new ESC method named BSN-ESC is proposed, including a big-small network-based ESC model that can assess the classification difficulty level and adaptively activate a big or small network for classification as well as a pre-classification processing technique with logmel spectrogram refining, which prevents distortion in the frequency-domain characteristics of the sound clip at the joint part of two adjacent sound clips. With the proposed methods, the computational complexity is significantly reduced, while the classification accuracy is still high. The proposed BSN-ESC model is implemented on both CPU and FPGA to evaluate its performance on both PC and embedded systems with the dataset ESC-50, which is the most commonly used dataset. The proposed BSN-ESC model achieves the lowest computational complexity with the number of floating-point operations (FLOPs) of only 0.123G, which represents a reduction of up to 2309 times in computational complexity compared with state-of-the-art methods while delivering a high classification accuracy of 89.25%. This work can achieve the realization of ESC being applied to AIoT devices with constrained computational resources.
近年来,环境声音分类(ESC)在许多人工智能物联网(AIoT)应用中盛行,因为环境声音包含丰富的信息,可用于检测特定事件。然而,现有的ESC方法计算复杂度高,不适用于计算资源受限的AIoT设备。因此,提出一种兼具高分类准确率和低计算复杂度的模型具有重要意义。在这项工作中,提出了一种名为BSN-ESC的新ESC方法,包括一个基于大小网络的ESC模型,该模型可以评估分类难度级别并自适应地激活大网络或小网络进行分类,以及一种带有对数梅尔频谱图细化的预分类处理技术,该技术可防止两个相邻声音片段的连接处声音片段的频域特征失真。使用所提出的方法,计算复杂度显著降低,同时分类准确率仍然很高。所提出的BSN-ESC模型在CPU和FPGA上均有实现,以使用最常用的数据集ESC-50评估其在PC和嵌入式系统上的性能。所提出的BSN-ESC模型实现了最低的计算复杂度,浮点运算次数(FLOPs)仅为0.123G,与现有技术方法相比,计算复杂度降低了多达2309倍,同时提供了89.25%的高分类准确率。这项工作可以实现ESC应用于计算资源受限的AIoT设备。