Hospedales Timothy M, Vijayakumar Sethu
Institute of Perception, Action, and Behaviour, university of Edinburgh, Edinburgh EH8 9AB.
IEEE Trans Pattern Anal Mach Intell. 2008 Dec;30(12):2140-57. doi: 10.1109/TPAMI.2008.25.
We investigate a solution to the problem of multi-sensor scene understanding by formulating it in the framework of Bayesian model selection and structure inference. Humans robustly associate multimodal data as appropriate, but previous modelling work has focused largely on optimal fusion, leaving segregation unaccounted for and unexploited by machine perception systems. We illustrate a unifying, Bayesian solution to multi-sensor perception and tracking which accounts for both integration and segregation by explicit probabilistic reasoning about data association in a temporal context. Such explicit inference of multimodal data association is also of intrinsic interest for higher level understanding of multisensory data. We illustrate this using a probabilistic implementation of data association in a multi-party audio-visual scenario, where unsupervised learning and structure inference is used to automatically segment, associate and track individual subjects in audiovisual sequences. Indeed, the structure inference based framework introduced in this work provides the theoretical foundation needed to satisfactorily explain many confounding results in human psychophysics experiments involving multimodal cue integration and association.
我们通过在贝叶斯模型选择和结构推理框架中进行公式化,来研究多传感器场景理解问题的解决方案。人类能够稳健地适当地关联多模态数据,但先前的建模工作主要集中在最优融合上,而机器感知系统未对分离进行考虑和利用。我们阐述了一种统一的贝叶斯多传感器感知与跟踪解决方案,该方案通过在时间背景下对数据关联进行明确的概率推理,兼顾了整合与分离。这种对多模态数据关联的明确推理对于更高层次的多感官数据理解也具有内在的重要性。我们在多方视听场景中使用数据关联的概率实现对此进行说明,其中无监督学习和结构推理用于自动分割、关联和跟踪视听序列中的各个主体。事实上,这项工作中引入的基于结构推理的框架提供了理论基础,能够令人满意地解释涉及多模态线索整合与关联的人类心理物理学实验中的许多令人困惑的结果。