Department of Neurosciences, Imaging and Clinical Sciences, University G. d'Annunzio of Chieti-Pescara, 9, 66100 Chieti, Italy.
Sensors (Basel). 2021 Sep 27;21(19):6438. doi: 10.3390/s21196438.
An intriguing challenge in the human-robot interaction field is the prospect of endowing robots with emotional intelligence to make the interaction more genuine, intuitive, and natural. A crucial aspect in achieving this goal is the robot's capability to infer and interpret human emotions. Thanks to its design and open programming platform, the NAO humanoid robot is one of the most widely used agents for human interaction. As with person-to-person communication, facial expressions are the privileged channel for recognizing the interlocutor's emotional expressions. Although NAO is equipped with a facial expression recognition module, specific use cases may require additional features and affective computing capabilities that are not currently available. This study proposes a highly accurate convolutional-neural-network-based facial expression recognition model that is able to further enhance the NAO robot' awareness of human facial expressions and provide the robot with an interlocutor's arousal level detection capability. Indeed, the model tested during human-robot interactions was 91% and 90% accurate in recognizing happy and sad facial expressions, respectively; 75% accurate in recognizing surprised and scared expressions; and less accurate in recognizing neutral and angry expressions. Finally, the model was successfully integrated into the NAO SDK, thus allowing for high-performing facial expression classification with an inference time of 0.34 ± 0.04 s.
在人机交互领域,一个有趣的挑战是赋予机器人情感智能,使交互更加真实、直观和自然。实现这一目标的一个关键方面是机器人推断和解释人类情感的能力。得益于其设计和开放的编程平台,NAO 人形机器人是最广泛用于人机交互的机器人之一。与人与人之间的交流一样,面部表情是识别对话者情感表达的首选渠道。尽管 NAO 配备了面部表情识别模块,但特定用例可能需要额外的功能和情感计算能力,而这些功能目前尚不可用。本研究提出了一种基于卷积神经网络的高精度面部表情识别模型,能够进一步增强 NAO 机器人对人类面部表情的感知能力,并为机器人提供识别对话者唤醒水平的能力。实际上,在人机交互过程中测试的模型在识别快乐和悲伤表情时的准确率分别为 91%和 90%;在识别惊讶和恐惧表情时的准确率为 75%;在识别中性和愤怒表情时的准确率较低。最后,该模型成功集成到了 NAO SDK 中,从而实现了高性能的面部表情分类,推断时间为 0.34 ± 0.04 s。