基于有限标签的场景图预测

Scene Graph Prediction with Limited Labels.

作者信息

Chen Vincent S, Varma Paroma, Krishna Ranjay, Bernstein Michael, Ré Christopher, Fei-Fei Li

机构信息

Stanford University.

出版信息

Proc IEEE Int Conf Comput Vis. 2019 Oct-Nov;2019:2580-2590. doi: 10.1109/iccv.2019.00267. Epub 2020 Feb 27.

DOI:10.1109/iccv.2019.00267

PMID:32218709

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7098690/

Abstract

Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.

摘要

诸如视觉基因组（Visual Genome）这样的视觉知识库为计算机视觉中的众多应用提供支持，包括视觉问答和图像字幕，但存在关系稀疏、不完整的问题。迄今为止，所有场景图模型都局限于在少量视觉关系上进行训练，每种关系有数千个训练标签。雇佣人工标注员成本高昂，而使用文本知识库补全方法又与视觉数据不兼容。在本文中，我们介绍了一种半监督方法，该方法使用少量带标签示例为大量无标签图像分配概率关系标签。我们分析视觉关系以提出两种与图像无关的特征，用于生成有噪声的启发式方法，其输出通过基于因子图的生成模型进行汇总。每种关系只需少至10个带标签示例，生成模型就能创建足够的训练数据来训练任何现有的先进场景图模型。我们证明，在场景图预测方面，我们的方法在PREDCLS的召回率@100上比所有基线方法高出5.16。在我们的有限标签设置中，我们为关系定义了一个复杂度度量，作为我们的方法优于迁移学习（有限标签训练的实际方法）的条件指标（R = 0.778）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d77b/7098690/08bc898c5318/nihms-1047378-f0001.jpg

相似文献

Scene Graph Prediction with Limited Labels.基于有限标签的场景图预测

Proc IEEE Int Conf Comput Vis. 2019 Oct-Nov;2019:2580-2590. doi: 10.1109/iccv.2019.00267. Epub 2020 Feb 27.

Learning From Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation.从像素级标签噪声中学习：半监督语义分割的新视角

IEEE Trans Image Process. 2022;31:623-635. doi: 10.1109/TIP.2021.3134142. Epub 2021 Dec 22.

Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification.基于伪标签的半监督深度学习在高光谱图像分类中的应用。

IEEE Trans Image Process. 2018 Mar;27(3):1259-1270. doi: 10.1109/TIP.2017.2772836. Epub 2017 Nov 13.

A Comprehensive Survey of Scene Graphs: Generation and Application.场景图的全面综述：生成与应用

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):1-26. doi: 10.1109/TPAMI.2021.3137605. Epub 2022 Dec 5.

Semi-Supervised Learning Under General Causal Models.

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7345-7356. doi: 10.1109/TNNLS.2024.3392750. Epub 2025 Apr 4.

Adversarial Reinforcement Learning With Object-Scene Relational Graph for Video Captioning.用于视频字幕的基于对象-场景关系图的对抗强化学习。

IEEE Trans Image Process. 2022;31:2004-2016. doi: 10.1109/TIP.2022.3148868. Epub 2022 Feb 25.

Robust discriminative tracking via landmark-based label propagation.基于地标标签传播的鲁棒判别跟踪。

IEEE Trans Image Process. 2015 May;24(5):1510-23. doi: 10.1109/TIP.2015.2405479. Epub 2015 Feb 19.

Self-supervised online metric learning with low rank constraint for scene categorization.基于低秩约束的自监督在线度量学习在场景分类中的应用。

IEEE Trans Image Process. 2013 Aug;22(8):3179-91. doi: 10.1109/TIP.2013.2260168. Epub 2013 Apr 25.

Web and personal image annotation by mining label correlation with relaxed visual graph embedding.通过挖掘标签相关性和放松的视觉图嵌入进行网络和个人图像注释。

IEEE Trans Image Process. 2012 Mar;21(3):1339-51. doi: 10.1109/TIP.2011.2169269. Epub 2011 Sep 23.

Weakly supervised visual dictionary learning by harnessing image attributes.利用图像属性进行弱监督视觉词典学习。

IEEE Trans Image Process. 2014 Dec;23(12):5400-11. doi: 10.1109/TIP.2014.2364536.

引用本文的文献

Spatial relation learning in complementary scenarios with deep neural networks.基于深度神经网络的互补场景下的空间关系学习

Front Neurorobot. 2022 Jul 28;16:844753. doi: 10.3389/fnbot.2022.844753. eCollection 2022.

本文引用的文献

Data Programming: Creating Large Training Sets, Quickly.数据编程：快速创建大型训练集。

Adv Neural Inf Process Syst. 2016 Dec;29:3567-3575.

Inferring Generative Model Structure with Static Analysis.通过静态分析推断生成模型结构。

Adv Neural Inf Process Syst. 2017 Dec;30:239-249.

Incremental Knowledge Base Construction Using DeepDive.使用DeepDive进行增量知识库构建。

Proceedings VLDB Endowment. 2015 Jul;8(11):1310-1321. doi: 10.14778/2809974.2809991.

Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.贝叶斯图模型、数组和其他可交换随机结构模型。

IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):437-61. doi: 10.1109/TPAMI.2014.2334607.

Constructing biological knowledge bases by extracting information from text sources.通过从文本来源中提取信息来构建生物知识库。

Proc Int Conf Intell Syst Mol Biol. 1999:77-86.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于有限标签的场景图预测

Scene Graph Prediction with Limited Labels.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献