Yan Keshan, Miao Shengfa, Jin Xin, Mu Yongkang, Zheng Hongfeng, Tian Yuling, Wang Puming, Yu Qian, Hu Da
School of Software, Yunnan University, Kunming 650000, China.
Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China.
Life (Basel). 2024 Oct 16;14(10):1313. doi: 10.3390/life14101313.
The automatic video recognition of depression is becoming increasingly important in clinical applications. However, traditional depression recognition models still face challenges in practical applications, such as high computational costs, the poor application effectiveness of facial movement features, and spatial feature degradation due to model stitching. To overcome these challenges, this work proposes a lightweight Time-Context Enhanced Depression Detection Network (TCEDN). We first use attention-weighted blocks to aggregate and enhance video frame-level features, easing the model's computational workload. Next, by integrating the temporal and spatial changes of video raw features and facial movement features in a self-learning weight manner, we enhance the precision of depression detection. Finally, a fusion network of 3-Dimensional Convolutional Neural Network (3D-CNN) and Convolutional Long Short-Term Memory Network (ConvLSTM) is constructed to minimize spatial feature loss by avoiding feature flattening and to achieve depression score prediction. Tests on the AVEC2013 and AVEC2014 datasets reveal that our approach yields results on par with state-of-the-art techniques for detecting depression using video analysis. Additionally, our method has significantly lower computational complexity than mainstream methods.
抑郁症的自动视频识别在临床应用中变得越来越重要。然而,传统的抑郁症识别模型在实际应用中仍然面临挑战,例如计算成本高、面部运动特征的应用效果不佳以及由于模型拼接导致的空间特征退化。为了克服这些挑战,这项工作提出了一种轻量级的时间上下文增强抑郁症检测网络(TCEDN)。我们首先使用注意力加权块来聚合和增强视频帧级特征,减轻模型的计算工作量。接下来,通过以自学习权重的方式整合视频原始特征和面部运动特征的时空变化,我们提高了抑郁症检测的精度。最后,构建了一个三维卷积神经网络(3D-CNN)和卷积长短期记忆网络(ConvLSTM)的融合网络,通过避免特征扁平化来最小化空间特征损失,并实现抑郁症评分预测。在AVEC2013和AVEC2014数据集上的测试表明,我们的方法在使用视频分析检测抑郁症方面的结果与最先进的技术相当。此外,我们的方法的计算复杂度明显低于主流方法。