Suppr超能文献

基于代数方法的抽象视觉推理。

Abstract visual reasoning based on algebraic methods.

作者信息

Zheng Mingyang, Wan Weibing, Fang Zhijun

机构信息

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China.

School of Computer Science and Technology, Donghua University, Shanghai, 201620, China.

出版信息

Sci Rep. 2025 Jan 28;15(1):3482. doi: 10.1038/s41598-025-86804-3.

Abstract

Extracting high-order abstract patterns from complex high-dimensional data forms the foundation of human cognitive abilities. Abstract visual reasoning involves identifying abstract patterns embedded within composite images, considered a core competency of machine intelligence. Traditional neuro-symbolic methods often infer unknown objects through data fitting, without fully exploring the abstract patterns within composite images and the sequential sensitivity of visual sequences. This paper constructs a relation model with object-centric inductive biases, learning end-to-end multi-granular rule embeddings at different levels. Through a gating fusion module, the model incrementally integrates explicit representations of objects and abstract relationships. The model incorporates a relational bottleneck method from information theory, separating the input perceptual information from the embeddings of abstract representations, thereby restricting and differentiating feature processing to encourage relational comparisons and induce the extraction of abstract patterns. Furthermore, this paper bridges algebraic operations and machine reasoning through the relational bottleneck method, extracting common patterns of multi-visual objects by identifying invariant sequences within the relational bottleneck matrix. Experimental results on the I-RAVEN dataset demonstrate a total accuracy of 96.8%, surpassing state-of-the-art baseline methods and exceeding human performance at 84.4%.

摘要

从复杂的高维数据中提取高阶抽象模式构成了人类认知能力的基础。抽象视觉推理涉及识别复合图像中嵌入的抽象模式,这被视为机器智能的核心能力。传统的神经符号方法通常通过数据拟合来推断未知对象,而没有充分探索复合图像中的抽象模式以及视觉序列的顺序敏感性。本文构建了一个具有以对象为中心的归纳偏差的关系模型,在不同层次上学习端到端的多粒度规则嵌入。通过一个门控融合模块,该模型逐步整合对象的显式表示和抽象关系。该模型结合了信息论中的关系瓶颈方法,将输入的感知信息与抽象表示的嵌入分离,从而限制和区分特征处理,以鼓励关系比较并诱导抽象模式的提取。此外,本文通过关系瓶颈方法将代数运算与机器推理联系起来,通过识别关系瓶颈矩阵中的不变序列来提取多视觉对象的共同模式。在I-RAVEN数据集上的实验结果表明,总准确率达到96.8%,超过了当前最先进的基线方法,并且超过了人类84.4%的表现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b2/11775302/90aad8d68deb/41598_2025_86804_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验