Suppr超能文献

基于代数方法的抽象视觉推理。

Abstract visual reasoning based on algebraic methods.

作者信息

Zheng Mingyang, Wan Weibing, Fang Zhijun

机构信息

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China.

School of Computer Science and Technology, Donghua University, Shanghai, 201620, China.

出版信息

Sci Rep. 2025 Jan 28;15(1):3482. doi: 10.1038/s41598-025-86804-3.

Abstract

Extracting high-order abstract patterns from complex high-dimensional data forms the foundation of human cognitive abilities. Abstract visual reasoning involves identifying abstract patterns embedded within composite images, considered a core competency of machine intelligence. Traditional neuro-symbolic methods often infer unknown objects through data fitting, without fully exploring the abstract patterns within composite images and the sequential sensitivity of visual sequences. This paper constructs a relation model with object-centric inductive biases, learning end-to-end multi-granular rule embeddings at different levels. Through a gating fusion module, the model incrementally integrates explicit representations of objects and abstract relationships. The model incorporates a relational bottleneck method from information theory, separating the input perceptual information from the embeddings of abstract representations, thereby restricting and differentiating feature processing to encourage relational comparisons and induce the extraction of abstract patterns. Furthermore, this paper bridges algebraic operations and machine reasoning through the relational bottleneck method, extracting common patterns of multi-visual objects by identifying invariant sequences within the relational bottleneck matrix. Experimental results on the I-RAVEN dataset demonstrate a total accuracy of 96.8%, surpassing state-of-the-art baseline methods and exceeding human performance at 84.4%.

摘要

从复杂的高维数据中提取高阶抽象模式构成了人类认知能力的基础。抽象视觉推理涉及识别复合图像中嵌入的抽象模式,这被视为机器智能的核心能力。传统的神经符号方法通常通过数据拟合来推断未知对象,而没有充分探索复合图像中的抽象模式以及视觉序列的顺序敏感性。本文构建了一个具有以对象为中心的归纳偏差的关系模型,在不同层次上学习端到端的多粒度规则嵌入。通过一个门控融合模块,该模型逐步整合对象的显式表示和抽象关系。该模型结合了信息论中的关系瓶颈方法,将输入的感知信息与抽象表示的嵌入分离,从而限制和区分特征处理,以鼓励关系比较并诱导抽象模式的提取。此外,本文通过关系瓶颈方法将代数运算与机器推理联系起来,通过识别关系瓶颈矩阵中的不变序列来提取多视觉对象的共同模式。在I-RAVEN数据集上的实验结果表明,总准确率达到96.8%,超过了当前最先进的基线方法,并且超过了人类84.4%的表现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b2/11775302/90aad8d68deb/41598_2025_86804_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验