Zhang Yongmian, Ji Qiang
Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, JEC 6003, 110 8th St., Troy, NY 12180, USA.
IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):699-714. doi: 10.1109/TPAMI.2005.93.
This paper explores the use of multisensory information fusion technique with Dynamic Bayesian networks (DBNs) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our facial feature detection and tracking based on active IR illumination provides reliable visual information under variable lighting and head motion. Our approach to facial expression recognition lies in the proposed dynamic and probabilistic framework based on combining DBNs with Ekman's Facial Action Coding System (FACS) for systematically modeling the dynamic and stochastic behaviors of spontaneous facial expressions. The framework not only provides a coherent and unified hierarchical probabilistic framework to represent spatial and temporal information related to facial expressions, but also allows us to actively select the most informative visual cues from the available information sources to minimize the ambiguity in recognition. The recognition of facial expressions is accomplished by fusing not only from the current visual observations, but also from the previous visual evidences. Consequently, the recognition becomes more robust and accurate through explicitly modeling temporal behavior of facial expression. In this paper, we present the theoretical foundation underlying the proposed probabilistic and dynamic framework for facial expression modeling and understanding. Experimental results demonstrate that our approach can accurately and robustly recognize spontaneous facial expressions from an image sequence under different conditions.
本文探讨了利用多感官信息融合技术与动态贝叶斯网络(DBN)来对面部表情在图像序列中的时间行为进行建模和理解。我们基于主动红外照明的面部特征检测与跟踪技术,在可变光照和头部运动情况下能够提供可靠的视觉信息。我们的面部表情识别方法在于所提出的基于将动态贝叶斯网络与艾克曼面部动作编码系统(FACS)相结合的动态概率框架,用于系统地对自发面部表情的动态和随机行为进行建模。该框架不仅提供了一个连贯统一的分层概率框架来表示与面部表情相关的空间和时间信息,还允许我们从可用信息源中主动选择最具信息性的视觉线索,以最大限度地减少识别中的模糊性。面部表情的识别不仅通过融合当前的视觉观察结果来完成,还融合了先前的视觉证据。因此,通过对面部表情的时间行为进行显式建模,识别变得更加稳健和准确。在本文中,我们展示了所提出的用于面部表情建模和理解的概率与动态框架的理论基础。实验结果表明,我们的方法能够在不同条件下准确且稳健地从图像序列中识别出自发面部表情。