Gao Shan, Guo Guangqian, Huang Hanqiao, Chen C L Philip
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):3976-3989. doi: 10.1109/TNNLS.2022.3225180. Epub 2025 Feb 28.
Weakly supervised object classification and localization are learned object classes and locations using only image-level labels, as opposed to bounding box annotations. Conventional deep convolutional neural network (CNN)-based methods activate the most discriminate part of an object in feature maps and then attempt to expand feature activation to the whole object, which leads to deteriorating the classification performance. In addition, those methods only use the most semantic information in the last feature map, while ignoring the role of shallow features. So, it remains a challenge to enhance classification and localization performance with a single frame. In this article, we propose a novel hybrid network, namely deep and broad hybrid network (DB-HybridNet), which combines deep CNNs with a broad learning network to learn discriminative and complementary features from different layers, and then integrates multilevel features (i.e., high-level semantic features and low-level edge features) in a global feature augmentation module. Importantly, we exploit different combinations of deep features and broad learning layers in DB-HybridNet and design an iterative training algorithm based on gradient descent to ensure the hybrid network work in an end-to-end framework. Through extensive experiments on caltech-UCSD birds (CUB)-200 and imagenet large scale visual recognition challenge (ILSVRC) 2016 datasets, we achieve state-of-the-art classification and localization performance.
弱监督目标分类与定位是指仅使用图像级标签来学习目标类别和位置,这与边界框标注不同。传统的基于深度卷积神经网络(CNN)的方法会激活特征图中目标最具区分性的部分,然后尝试将特征激活扩展到整个目标,这会导致分类性能下降。此外,这些方法仅使用最后一个特征图中的最高语义信息,而忽略了浅层特征的作用。因此,仅用单帧提高分类和定位性能仍然是一个挑战。在本文中,我们提出了一种新颖的混合网络,即深度与广度混合网络(DB-HybridNet),它将深度CNN与广度学习网络相结合,从不同层学习有区分性和互补性的特征,然后在全局特征增强模块中整合多级特征(即高级语义特征和低级边缘特征)。重要的是,我们在DB-HybridNet中利用深度特征和广度学习层的不同组合,并基于梯度下降设计了一种迭代训练算法,以确保混合网络在端到端框架中运行。通过在加州理工学院-加州大学圣地亚哥分校鸟类(CUB)-200和ImageNet大规模视觉识别挑战赛(ILSVRC)2016数据集上进行的大量实验,我们取得了领先的分类和定位性能。