IEEE Trans Cybern. 2013 Apr;43(2):699-711. doi: 10.1109/TSMCB.2012.2214477. Epub 2013 Mar 7.
In this paper, we present a Bayesian framework for the active multimodal perception of 3-D structure and motion. The design of this framework finds its inspiration in the role of the dorsal perceptual pathway of the human brain. Its composing models build upon a common egocentric spatial configuration that is naturally fitting for the integration of readings from multiple sensors using a Bayesian approach. In the process, we will contribute with efficient and robust probabilistic solutions for cyclopean geometry-based stereovision and auditory perception based only on binaural cues, modeled using a consistent formalization that allows their hierarchical use as building blocks for the multimodal sensor fusion framework. We will explicitly or implicitly address the most important challenges of sensor fusion using this framework, for vision, audition, and vestibular sensing. Moreover, interaction and navigation require maximal awareness of spatial surroundings, which, in turn, is obtained through active attentional and behavioral exploration of the environment. The computational models described in this paper will support the construction of a simultaneously flexible and powerful robotic implementation of multimodal active perception to be used in real-world applications, such as human-machine interaction or mobile robot navigation.
在本文中,我们提出了一种贝叶斯框架,用于主动多模态感知 3D 结构和运动。该框架的设计灵感来源于人类大脑背侧感知通路的作用。其组成模型基于一个共同的以自我为中心的空间配置,非常适合使用贝叶斯方法整合来自多个传感器的读数。在这个过程中,我们将为基于视差几何的立体视觉和仅基于双耳线索的听觉感知提供高效、稳健的概率解决方案,这些解决方案使用一致的形式化进行建模,允许它们分层用作多模态传感器融合框架的构建块。我们将使用这个框架显式或隐式地解决传感器融合中最重要的挑战,包括视觉、听觉和前庭感知。此外,交互和导航需要对空间环境有最大的感知,而这反过来又可以通过主动注意和对环境的行为探索来获得。本文中描述的计算模型将支持构建一个同时具有灵活性和强大功能的机器人多模态主动感知实现,用于实际应用,如人机交互或移动机器人导航。