Suppr超能文献

通过引导逆流处理实现类人场景解释。

Human-like scene interpretation by a guided counterstream processing.

机构信息

Department of Computer Science, the Weizmann Institute of Science, Rehovot 76100, Israel.

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

出版信息

Proc Natl Acad Sci U S A. 2023 Oct 3;120(40):e2211179120. doi: 10.1073/pnas.2211179120. Epub 2023 Sep 28.

Abstract

In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, , 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.

摘要

在视觉建模方面,识别一系列场景成分方面已经取得了显著进展,但分析完整场景的问题,即视觉感知的最终目标,仍然很大程度上尚未解决。为了处理完整的场景,最近的工作集中在训练模型以提取场景的完整图形结构上。与场景图不同,人类的场景感知侧重于场景中的选定结构,从有限的解释开始,并以目标导向的方式顺序发展[G. L. Malcolm,I. I. A. Groen,C. I. Baker, ,843-856(2016)]。在整个场景解释过程中,指导是至关重要的,因为提取完整的场景表示通常是不可行的。在这里,我们提出了一种模型,该模型使用由皮质电路启发的“逆流”结构,通过迭代的自下而上、自上而下的处理,进行类似于人类的引导场景解释。该过程通过自上而下的指令的顺序应用来进行,这些指令指导解释过程。结果表明,如何通过自动选择的自上而下的指令序列来提取观众感兴趣的场景结构。该模型还具有两个进一步的优势。一个是具有很好地处理组合泛化问题的固有能力-广泛推广到看不见的场景配置,这在当前的网络模型中受到限制[B. Lake,M. Baroni, (2018)]。第二个是能够在解释过程的每个周期中结合视觉和非视觉信息,这是建模人类感知以及推进 AI 视觉系统的关键方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5d6/10556630/9d676e836dd4/pnas.2211179120fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验