Zhang Haigang, Meng Xianglong, Cao Weipeng, Liu Ye, Ming Zhong, Yang Jinfeng
Institute of Applied Artificial Intelligence of the Guangdong-Hong Kong-Macao Greater Bay Area, Shenzhen Polytechnic, Shenzhen, 518055, China.
Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Shenzhen, 518107, China.
Neural Netw. 2023 Oct;167:129-140. doi: 10.1016/j.neunet.2023.08.023. Epub 2023 Aug 19.
Multi-label Zero-shot Learning (ZSL) is more reasonable and realistic than standard single-label ZSL because several objects can co-exist in a natural image in real scenarios. Intra-class feature entanglement is a significant factor influencing the alignment of visual and semantic features, resulting in the model's inability to recognize unseen samples comprehensively and completely. We observe that existing multi-label ZSL methods place a greater emphasis on attention-based refinement and decoupling of visual features, while ignoring the relationship between label semantics. Relying on label correlations to solve multi-label ZSL tasks has not been deeply studied. In this paper, we make full use of the co-occurrence relationship between category labels and build a directed weighted semantic graph based on statistics and prior knowledge, in which node features represent category semantics and weighted edges represent conditional probabilities of label co-occurrence. To guide the targeted extraction of visual features, node features and edge set weights are simultaneously updated and refined, and embedded into the visual feature extraction network from a global and local perspective. The proposed method's effectiveness was demonstrated by simulation results on two challenging multi-label ZSL benchmarks: NUS-WIDE and Open Images. In comparison to state-of-the-art models, our model achieves an absolute gain of 2.4% mAP on NUS-WIDE and 2.1% mAP on Open Images respectively.
多标签零样本学习(ZSL)比标准的单标签ZSL更合理、更现实,因为在实际场景中,自然图像中可能同时存在多个物体。类内特征纠缠是影响视觉和语义特征对齐的一个重要因素,导致模型无法全面、完整地识别未见样本。我们观察到,现有的多标签ZSL方法更侧重于基于注意力的视觉特征细化和解耦,而忽略了标签语义之间的关系。依靠标签相关性来解决多标签ZSL任务尚未得到深入研究。在本文中,我们充分利用类别标签之间的共现关系,基于统计和先验知识构建了一个有向加权语义图,其中节点特征表示类别语义,加权边表示标签共现的条件概率。为了指导视觉特征的针对性提取,同时更新和细化节点特征和边集权重,并从全局和局部角度将其嵌入到视觉特征提取网络中。在两个具有挑战性的多标签ZSL基准数据集NUS-WIDE和Open Images上的仿真结果证明了所提方法的有效性。与现有最先进的模型相比,我们的模型在NUS-WIDE上的平均精度均值(mAP)绝对增益为2.4%,在Open Images上为2.1%。