基于代数方法的抽象视觉推理。

Abstract visual reasoning based on algebraic methods.

作者信息

Zheng Mingyang, Wan Weibing, Fang Zhijun

机构信息

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China.

School of Computer Science and Technology, Donghua University, Shanghai, 201620, China.

出版信息

Sci Rep. 2025 Jan 28;15(1):3482. doi: 10.1038/s41598-025-86804-3.

DOI:10.1038/s41598-025-86804-3

PMID:39875490

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11775302/

Abstract

Extracting high-order abstract patterns from complex high-dimensional data forms the foundation of human cognitive abilities. Abstract visual reasoning involves identifying abstract patterns embedded within composite images, considered a core competency of machine intelligence. Traditional neuro-symbolic methods often infer unknown objects through data fitting, without fully exploring the abstract patterns within composite images and the sequential sensitivity of visual sequences. This paper constructs a relation model with object-centric inductive biases, learning end-to-end multi-granular rule embeddings at different levels. Through a gating fusion module, the model incrementally integrates explicit representations of objects and abstract relationships. The model incorporates a relational bottleneck method from information theory, separating the input perceptual information from the embeddings of abstract representations, thereby restricting and differentiating feature processing to encourage relational comparisons and induce the extraction of abstract patterns. Furthermore, this paper bridges algebraic operations and machine reasoning through the relational bottleneck method, extracting common patterns of multi-visual objects by identifying invariant sequences within the relational bottleneck matrix. Experimental results on the I-RAVEN dataset demonstrate a total accuracy of 96.8%, surpassing state-of-the-art baseline methods and exceeding human performance at 84.4%.

摘要

从复杂的高维数据中提取高阶抽象模式构成了人类认知能力的基础。抽象视觉推理涉及识别复合图像中嵌入的抽象模式，这被视为机器智能的核心能力。传统的神经符号方法通常通过数据拟合来推断未知对象，而没有充分探索复合图像中的抽象模式以及视觉序列的顺序敏感性。本文构建了一个具有以对象为中心的归纳偏差的关系模型，在不同层次上学习端到端的多粒度规则嵌入。通过一个门控融合模块，该模型逐步整合对象的显式表示和抽象关系。该模型结合了信息论中的关系瓶颈方法，将输入的感知信息与抽象表示的嵌入分离，从而限制和区分特征处理，以鼓励关系比较并诱导抽象模式的提取。此外，本文通过关系瓶颈方法将代数运算与机器推理联系起来，通过识别关系瓶颈矩阵中的不变序列来提取多视觉对象的共同模式。在I-RAVEN数据集上的实验结果表明，总准确率达到96.8%，超过了当前最先进的基线方法，并且超过了人类84.4%的表现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b2/11775302/90aad8d68deb/41598_2025_86804_Fig1_HTML.jpg

相似文献

Abstract visual reasoning based on algebraic methods.基于代数方法的抽象视觉推理。

Sci Rep. 2025 Jan 28;15(1):3482. doi: 10.1038/s41598-025-86804-3.

Symbolic Learning and Reasoning With Noisy Data for Probabilistic Anchoring.用于概率锚定的含噪声数据的符号学习与推理

Front Robot AI. 2020 Jul 31;7:100. doi: 10.3389/frobt.2020.00100. eCollection 2020.

Integrating Non-monotonic Logical Reasoning and Inductive Learning With Deep Learning for Explainable Visual Question Answering.将非单调逻辑推理和归纳学习与深度学习相结合用于可解释视觉问答

Front Robot AI. 2019 Dec 11;6:125. doi: 10.3389/frobt.2019.00125. eCollection 2019.

Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object Retrieval.基于超图的开放集3D物体检索多模态表示

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2206-2223. doi: 10.1109/TPAMI.2023.3332768. Epub 2024 Mar 6.

Mastering algebra retrains the visual system to perceive hierarchical structure in equations.掌握代数能重新训练视觉系统，使其能够感知方程式中的层次结构。

Cogn Res Princ Implic. 2016;1(1):25. doi: 10.1186/s41235-016-0020-9. Epub 2016 Dec 7.

Multi-Label Contrastive Learning for Abstract Visual Reasoning.用于抽象视觉推理的多标签对比学习

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1941-1953. doi: 10.1109/TNNLS.2022.3185949. Epub 2024 Feb 5.

An effective spatial relational reasoning networks for visual question answering.用于视觉问答的有效的空间关系推理网络。

PLoS One. 2022 Nov 28;17(11):e0277693. doi: 10.1371/journal.pone.0277693. eCollection 2022.

SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network.求解器：场景-对象关联视觉情感推理网络。

IEEE Trans Image Process. 2021;30:8686-8701. doi: 10.1109/TIP.2021.3118983. Epub 2021 Oct 22.

An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统：开发研究

JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.

DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning.DL-PPI：一种基于深度学习的预测序列蛋白质相互作用的方法。

BMC Bioinformatics. 2023 Dec 14;24(1):473. doi: 10.1186/s12859-023-05594-5.

本文引用的文献

PRN: progressive reasoning network and its image completion applications.PRN：渐进式推理网络及其图像补全应用

Sci Rep. 2024 Oct 9;14(1):23519. doi: 10.1038/s41598-024-72368-1.

Capturing advanced human cognitive abilities with deep neural networks.利用深度神经网络捕捉高级人类认知能力。

Trends Cogn Sci. 2022 Dec;26(12):1047-1050. doi: 10.1016/j.tics.2022.09.018. Epub 2022 Nov 2.

Inference for Multiple Heterogeneous Networks with a Common Invariant Subspace.具有公共不变子空间的多个异构网络的推断

J Mach Learn Res. 2021 Mar;22(141):1-49.

Not-So-CLEVR: learning same-different relations strains feedforward neural networks.并非那么“聪明”：学习相同-不同关系使前馈神经网络面临挑战。

Interface Focus. 2018 Aug 6;8(4):20180011. doi: 10.1098/rsfs.2018.0011. Epub 2018 Jun 15.

Distinct neural substrates of visuospatial and verbal-analytic reasoning as assessed by Raven's Advanced Progressive Matrices.根据瑞文高级推理测验评估的视空间和言语分析推理的不同神经基础。

Sci Rep. 2017 Nov 24;7(1):16230. doi: 10.1038/s41598-017-16437-8.

Improving fluid intelligence with training on working memory.通过工作记忆训练提高流体智力。

Proc Natl Acad Sci U S A. 2008 May 13;105(19):6829-33. doi: 10.1073/pnas.0801268105. Epub 2008 Apr 28.

What one intelligence test measures: a theoretical account of the processing in the Raven Progressive Matrices Test.一项智力测验所测量的内容：对瑞文渐进性矩阵测验中加工过程的理论阐释。

Psychol Rev. 1990 Jul;97(3):404-31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于代数方法的抽象视觉推理。

Abstract visual reasoning based on algebraic methods.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献