School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
Sci Rep. 2020 Jul 9;10(1):11307. doi: 10.1038/s41598-020-67529-x.
Object detection is an important component of computer vision. Most of the recent successful object detection methods are based on convolutional neural networks (CNNs). To improve the performance of these networks, researchers have designed many different architectures. They found that the CNN performance benefits from carefully increasing the depth and width of their structures with respect to the spatial dimension. Some researchers have exploited the cardinality dimension. Others have found that skip and dense connections were also of benefit to performance. Recently, attention mechanisms on the channel dimension have gained popularity with researchers. Global average pooling is used in SENet to generate the input feature vector of the channel-wise attention unit. In this work, we argue that channel-wise attention can benefit from both global average pooling and global max pooling. We designed three novel attention units, namely, an adaptive channel-wise attention unit, an adaptive spatial-wise attention unit and an adaptive domain attention unit, to improve the performance of a CNN. Instead of concatenating the output of the two attention vectors generated by the two channel-wise attention sub-units, we weight the two attention vectors based on the output data of the two channel-wise attention sub-units. We integrated the proposed mechanism with the YOLOv3 and MobileNetv2 framework and tested the proposed network on the KITTI and Pascal VOC datasets. The experimental results show that YOLOv3 with the proposed attention mechanism outperforms the original YOLOv3 by mAP values of 2.9 and 1.2% on the KITTI and Pascal VOC datasets, respectively. MobileNetv2 with the proposed attention mechanism outperforms the original MobileNetv2 by a mAP value of 1.7% on the Pascal VOC dataset.
目标检测是计算机视觉的一个重要组成部分。最近大多数成功的目标检测方法都是基于卷积神经网络(CNN)的。为了提高这些网络的性能,研究人员设计了许多不同的架构。他们发现,CNN 的性能受益于仔细增加其结构的深度和宽度,相对于空间维度。一些研究人员利用了基数维。另一些人则发现,跳过和密集连接也有利于提高性能。最近,通道维度的注意力机制在研究人员中得到了广泛的关注。SENet 中使用全局平均池化来生成通道注意力单元的输入特征向量。在这项工作中,我们认为通道注意力可以从全局平均池化和全局最大池化中受益。我们设计了三个新的注意力单元,即自适应通道注意力单元、自适应空间注意力单元和自适应域注意力单元,以提高 CNN 的性能。我们不是将两个通道注意力子单元生成的输出注意力向量进行拼接,而是根据两个通道注意力子单元的输出数据对两个注意力向量进行加权。我们将所提出的机制与 YOLOv3 和 MobileNetv2 框架集成,并在 KITTI 和 Pascal VOC 数据集上对所提出的网络进行测试。实验结果表明,在 KITTI 和 Pascal VOC 数据集上,使用所提出的注意力机制的 YOLOv3 比原始 YOLOv3 的 mAP 值分别提高了 2.9%和 1.2%。在 Pascal VOC 数据集上,使用所提出的注意力机制的 MobileNetv2 比原始的 MobileNetv2 的 mAP 值提高了 1.7%。