通过面向关系的多模态模型提示进行开放视觉知识提取

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting.

作者信息

Cui Hejie, Fang Xinyu, Zhang Zihan, Xu Ran, Kan Xuan, Liu Xin, Yu Yue, Li Manling, Song Yangqiu, Yang Carl

机构信息

Emory University.

Tongji University.

出版信息

Adv Neural Inf Process Syst. 2023 Dec;36:23499-23519. Epub 2024 May 30.

PMID:39130613

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11315466/

Abstract

Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.

摘要

图像包含丰富的关系知识，可帮助机器理解世界。现有的视觉知识提取方法通常依赖于预定义的格式（例如，主谓宾元组）或词汇（例如，关系类型），这限制了所提取知识的表达能力。在这项工作中，我们首次探索了一种新的开放视觉知识提取范式。为实现这一目标，我们提出了OpenVik，它由一个开放关系区域检测器和一个视觉知识生成器组成，前者用于检测可能包含关系知识的区域，后者通过用检测到的感兴趣区域提示大型多模态模型来生成无格式知识。我们还探索了两种数据增强技术，以使生成的无格式视觉知识多样化。广泛的知识质量评估突出了OpenVik所提取的开放视觉知识的正确性和独特性。此外，将我们提取的知识集成到各种视觉推理应用中显示出持续的改进，这表明OpenVik在现实世界中的适用性。

相似文献

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting.通过面向关系的多模态模型提示进行开放视觉知识提取

Adv Neural Inf Process Syst. 2023 Dec;36:23499-23519. Epub 2024 May 30.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A Novel Tensor Learning Model for Joint Relational Triplet Extraction.一种用于联合关系三元组提取的新型张量学习模型。

IEEE Trans Cybern. 2024 Apr;54(4):2483-2494. doi: 10.1109/TCYB.2023.3265851. Epub 2024 Mar 18.

Impact of summer programmes on the outcomes of disadvantaged or 'at risk' young people: A systematic review.暑期项目对处境不利或“有风险”的年轻人的影响：一项系统综述。

Campbell Syst Rev. 2024 Jun 13;20(2):e1406. doi: 10.1002/cl2.1406. eCollection 2024 Jun.

Bootstrapping Knowledge Graphs From Images and Text.从图像和文本中构建知识图谱

Front Neurorobot. 2019 Nov 12;13:93. doi: 10.3389/fnbot.2019.00093. eCollection 2019.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Room-Object Entity Prompting and Reasoning for Embodied Referring Expression.用于具身指代表达的房间-物体实体提示与推理

IEEE Trans Pattern Anal Mach Intell. 2024 Feb;46(2):994-1010. doi: 10.1109/TPAMI.2023.3326851. Epub 2024 Jan 8.

Two Computational Approaches to Visual Analogy: Task-Specific Models Versus Domain-General Mapping.两种视觉类比的计算方法：特定任务模型与领域通用映射。

Cogn Sci. 2023 Sep;47(9):e13347. doi: 10.1111/cogs.13347.

Relational Temporal Graph Reasoning for Dual-Task Dialogue Language Understanding.用于双任务对话语言理解的关系时态图推理

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13170-13184. doi: 10.1109/TPAMI.2023.3289509. Epub 2023 Oct 3.

Fast detection of the main anatomical structures in digital retinal images based on intra- and inter-structure relational knowledge.基于结构内和结构间关系知识的数字视网膜图像中主要解剖结构的快速检测

Comput Methods Programs Biomed. 2017 Oct;149:55-68. doi: 10.1016/j.cmpb.2017.06.022. Epub 2017 Jul 22.

本文引用的文献

Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training.通过检索增强多阶段训练实现弱监督科学文献分类

Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2501-2505. doi: 10.1145/3539618.3592085. Epub 2023 Jul 18.

Neighborhood-Regularized Self-Training for Learning with Few Labels.用于少标签学习的邻域正则化自训练

Proc AAAI Conf Artif Intell. 2023 Jun 27;37(9):10611-10619. doi: 10.1609/aaai.v37i9.26260.

Counterfactual and Factual Reasoning over Hypergraphs for Interpretable Clinical Predictions on EHR.基于超图的反事实与事实推理用于电子健康记录的可解释临床预测

Proc Mach Learn Res. 2022 Nov;193:259-278.

SumGNN: multi-typed drug interaction prediction via efficient knowledge graph summarization.SumGNN：通过有效的知识图总结进行多类型药物相互作用预测。

Bioinformatics. 2021 Sep 29;37(18):2988-2995. doi: 10.1093/bioinformatics/btab207.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN：基于区域建议网络的实时目标检测。

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。