Zhou Quan, Shi Huimin, Xiang Weikang, Kang Bin, Latecki Longin Jan
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4504-4518. doi: 10.1109/TNNLS.2024.3376563. Epub 2025 Feb 28.
The recent advances in compressing high-accuracy convolutional neural networks (CNNs) have witnessed remarkable progress in real-time object detection. To accelerate detection speed, lightweight detectors always have few convolution layers using a single-path backbone. Single-path architecture, however, involves continuous pooling and downsampling operations, always resulting in coarse and inaccurate feature maps that are disadvantageous to locate objects. On the other hand, due to limited network capacity, recent lightweight networks are often weak in representing large-scale visual data. To address these problems, we present a dual-path network, named DPNet, with a lightweight attention scheme for real-time object detection. The dual-path architecture enables us to extract in parallel high-level semantic features and low-level object details. Although DPNet has a nearly duplicated shape with respect to single-path detectors, the computational costs and model size are not significantly increased. To enhance representation capability, a lightweight self-correlation module (LSCM) is designed to capture global interactions, with only a few computational overheads and network parameters. In the neck, LSCM is extended into a lightweight cross correlation module (LCCM), capturing mutual dependencies among neighboring scale features. We have conducted exhaustive experiments on MS COCO, Pascal VOC 2007, and ImageNet datasets. The experimental results demonstrate that DPNet achieves a state-of-the-art trade off between detection accuracy and implementation efficiency. More specifically, DPNet achieves 31.3% AP on MS COCO test-dev, 82.7% mAP on Pascal VOC 2007 test set, and 41.6% mAP on ImageNet validation set, together with nearly 2.5M model size, 1.04 GFLOPs, and 164 and 196 frames/s (FPS) FPS for input images of three datasets.
在压缩高精度卷积神经网络(CNN)方面的最新进展在实时目标检测中取得了显著进展。为了加快检测速度,轻量级检测器通常使用单路径主干,卷积层较少。然而,单路径架构涉及连续的池化和下采样操作,总是会产生粗糙且不准确的特征图,不利于目标定位。另一方面,由于网络容量有限,最近的轻量级网络在表示大规模视觉数据方面往往较弱。为了解决这些问题,我们提出了一种双路径网络,名为DPNet,它具有用于实时目标检测的轻量级注意力机制。双路径架构使我们能够并行提取高级语义特征和低级目标细节。虽然DPNet相对于单路径检测器具有几乎相同的形状,但计算成本和模型大小并没有显著增加。为了增强表示能力,设计了一种轻量级自相关模块(LSCM)来捕获全局交互,只需要很少的计算开销和网络参数。在颈部,LSCM扩展为轻量级交叉相关模块(LCCM),捕获相邻尺度特征之间的相互依赖关系。我们在MS COCO、Pascal VOC 2007和ImageNet数据集上进行了详尽的实验。实验结果表明,DPNet在检测精度和实现效率之间实现了最优平衡。具体而言,DPNet在MS COCO测试开发集上达到31.3%的平均精度(AP),在Pascal VOC 2007测试集上达到82.7%的平均精度均值(mAP),在ImageNet验证集上达到41.6%的平均精度均值(mAP),同时模型大小接近250万,浮点运算次数(GFLOPs)为1.04,对于三个数据集的输入图像,帧率分别为164帧/秒和196帧/秒(FPS)。