Liu Yiwei, Tang Luping, Liao Chen, Zhang Chun, Guo Yingqing, Xia Yixuan, Zhang Yangyang, Yao Sisi
College of Mechanical and Electrical Engineering, Nanjing Forestry University, Nanjing 210037, China.
SEU-FEI Nano-Pico Center, Key Lab of MEMS of Ministry of Education, Southeast University, Nanjing 210096, China.
Sensors (Basel). 2023 Oct 10;23(20):8351. doi: 10.3390/s23208351.
Regarding the interpretable techniques in the field of image recognition, Grad-CAM is widely used for feature localization in images to reflect the logical decision-making information behind the neural network due to its high applicability. However, extensive experimentation on a customized dataset revealed that the deep convolutional neural network (CNN) model based on Gradient-weighted Class Activation Mapping (Grad-CAM) technology cannot effectively resist the interference of large-scale noise. In this article, an optimization of the deep CNN model was proposed by incorporating the Dropkey and Dropout (as a comparison) algorithm. Compared with Grad-CAM, the improved Grad-CAM based on Dropkey applies an attention mechanism to the feature map before calculating the gradient, which can introduce randomness and eliminate some areas by applying a mask to the attention score. Experimental results show that the optimized Grad-CAM deep CNN model based on the Dropkey algorithm can effectively resist large-scale noise interference and achieve accurate localization of image features. For instance, under the interference of a noise variance of 0.6, the Dropkey-enhanced ResNet50 model achieves a confidence level of 0.878 in predicting results, while the other two models exhibit confidence levels of 0.766 and 0.481, respectively. Moreover, it exhibits excellent performance in visualizing tasks related to image features such as distortion, low contrast, and small object characteristics. Furthermore, it has promising prospects in practical computer vision applications. For instance, in the field of autonomous driving, it can assist in verifying whether deep learning models accurately understand and process crucial objects, road signs, pedestrians, or other elements in the environment.
关于图像识别领域中的可解释技术,Grad-CAM因其高度适用性而被广泛用于图像中的特征定位,以反映神经网络背后的逻辑决策信息。然而,在一个定制数据集上进行的大量实验表明,基于梯度加权类激活映射(Grad-CAM)技术的深度卷积神经网络(CNN)模型无法有效抵抗大规模噪声的干扰。在本文中,通过结合Dropkey和Dropout(作为对比)算法,提出了对深度CNN模型的优化。与Grad-CAM相比,基于Dropkey的改进型Grad-CAM在计算梯度之前对特征图应用了注意力机制,这可以引入随机性并通过对注意力分数应用掩码来消除一些区域。实验结果表明,基于Dropkey算法优化后的Grad-CAM深度CNN模型能够有效抵抗大规模噪声干扰,并实现图像特征的精确定位。例如,在噪声方差为0.6的干扰下,Dropkey增强的ResNet50模型在预测结果中的置信度达到0.878,而其他两个模型的置信度分别为0.766和0.481。此外,它在与图像特征相关的可视化任务中表现出色,如图像失真、低对比度和小目标特征等。此外,它在实际的计算机视觉应用中具有广阔的前景。例如,在自动驾驶领域,它可以协助验证深度学习模型是否准确理解和处理环境中的关键物体、道路标志、行人或其他元素。