Department of Computer Science and Technology, Centre for Computational Mental Healthcare, Research Institute of Data Science, Tsinghua University, Beijing 100084, China.
Sensors (Basel). 2020 Sep 28;20(19):5552. doi: 10.3390/s20195552.
Stress has become an increasingly serious problem in the current society, threatening mankind's well-beings. With the ubiquitous deployment of video cameras in surroundings, detecting stress based on the contact-free camera sensors becomes a cost-effective and mass-reaching way without interference of artificial traits and factors. In this study, we leverage users' facial expressions and action motions in the video and present a two-leveled stress detection network (TSDNet). TSDNet firstly learns face- and action-level representations separately, and then fuses the results through a stream weighted integrator with local and global attention for stress identification. To evaluate the performance of TSDNet, we constructed a video dataset containing 2092 labeled video clips, and the experimental results on the built dataset show that: (1) TSDNet outperformed the hand-crafted feature engineering approaches with detection accuracy 85.42% and F1-Score 85.28%, demonstrating the feasibility and effectiveness of using deep learning to analyze one's face and action motions; and (2) considering both facial expressions and action motions could improve detection accuracy and F1-Score of that considering only face or action method by over 7%.
压力已成为当前社会中一个日益严重的问题,威胁着人类的福祉。随着摄像头在周围环境中的广泛部署,基于无需接触的摄像头传感器来检测压力成为一种具有成本效益且广泛适用的方法,不会受到人为特征和因素的干扰。在本研究中,我们利用视频中的用户面部表情和动作,提出了一种两级压力检测网络(TSDNet)。TSDNet 首先分别学习面部和动作级别的表示,然后通过带有局部和全局注意力的流加权集成器融合结果,用于压力识别。为了评估 TSDNet 的性能,我们构建了一个包含 2092 个标记视频片段的视频数据集,在该数据集上的实验结果表明:(1)TSDNet 优于手工特征工程方法,检测准确率为 85.42%,F1-Score 为 85.28%,证明了使用深度学习分析人脸和动作的可行性和有效性;(2)同时考虑面部表情和动作可以将检测准确率和 F1-Score 分别提高 7%以上,优于仅考虑面部或动作的方法。