Luo Yu, Ye Jianbo, Adams Reginald B, Li Jia, Newman Michelle G, Wang James Z
College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA.
Present Address: Amazon Lab126, Sunnyvale, CA, USA.
Int J Comput Vis. 2020 Jan;128(1):1-25. doi: 10.1007/s11263-019-01215-y. Epub 2019 Aug 31.
Humans are arguably innately prepared to comprehend others' emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expressions and body movements. The current research, as a multidisciplinary effort among computer and information sciences, psychology, and statistics, proposes a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize body languages of humans. To accomplish this task, a large and growing annotated dataset with 9876 video clips of body movements and 13,239 human characters, named Body Language Dataset (BoLD), has been created. Comprehensive statistical analysis of the dataset revealed many interesting insights. A system to model the emotional expressions based on bodily movements, named Automated Recognition of Bodily Expression of Emotion (ARBEE), has also been developed and evaluated. Our analysis shows the effectiveness of Laban Movement Analysis (LMA) features in characterizing arousal, and our experiments using LMA features further demonstrate computability of bodily expression. We report and compare results of several other baseline methods which were developed for action recognition based on two different modalities, body skeleton and raw image. The dataset and findings presented in this work will likely serve as a launchpad for future discoveries in body language understanding that will enable future robots to interact and collaborate more effectively with humans.
可以说,人类天生就具备从微妙的身体动作中理解他人情感表达的能力。如果机器人或计算机能够具备这种能力,那么许多机器人应用将成为可能。然而,鉴于对情感表达与身体动作之间关系的理解尚不完整,在不受约束的情况下自动识别人类身体表达是一项艰巨的任务。当前的研究作为计算机与信息科学、心理学和统计学之间的多学科努力,提出了一种可扩展且可靠的众包方法,用于收集自然场景下的感知情感数据,以便计算机学习识别人类的肢体语言。为完成这项任务,已创建了一个名为肢体语言数据集(BoLD)的大型且不断增长的带注释数据集,其中包含9876个身体动作视频片段和13239个角色。对该数据集的综合统计分析揭示了许多有趣的见解。还开发并评估了一个基于身体动作对情感表达进行建模的系统,名为情感身体表达自动识别(ARBEE)。我们的分析表明了拉班动作分析(LMA)特征在表征唤醒方面的有效性,并且我们使用LMA特征进行的实验进一步证明了身体表达的可计算性。我们报告并比较了基于身体骨骼和原始图像这两种不同模态为动作识别而开发的其他几种基线方法的结果。这项工作中呈现的数据集和研究结果可能会成为未来肢体语言理解领域新发现的起点,使未来的机器人能够更有效地与人类进行交互和协作。