School of Nursing, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China.
Int J Geriatr Psychiatry. 2022 Nov;37(11). doi: 10.1002/gps.5827.
This study aimed to develop a classification model to detect and distinguish apathy and depression based on text, audio, and video features and to make use of the shapely additive explanations (SHAP) toolkit to increase the model interpretability.
Subjective scales and objective experiments were conducted on 319 mild cognitive impairment (MCI) patients to measure apathy and depression. The MCI patients were classified into four groups, depression only, apathy only, depressed-apathetic, and the normal group. Speech, facial and text features were extracted using the open-source data analysis toolkits. Multiclass classification and SHAP toolkits were used to develop a classification model and explain the contribution of specific features.
The macro-averaged f1 score and accuracy for overall model were 0.91 and 0.90, respectively. The accuracy for the apathetic, depressed, depressed-apathetic, and normal groups were 0.98, 0.88, 0.93, and 0.82, respectively. The SHAP toolkit identified speech features (Mel-frequency cepstral coefficient (MFCC) 4, spectral slopes, F0, F1), facial features (action unit (AU) 14, 26, 28, 45), and text feature (text 6 semantic) associated with apathy. Meanwhile, speech features (spectral slopes, shimmer, F0) and facial expression (AU 2, 6, 7, 10, 14, 26, 45) were associated with depression. Apart from the shared features mentioned above, new speech (MFCC 2, loudness) and facial (AU 9) features were observed in the depressive-apathetic group.
Apathy and depression shared some verbal and facial features while also exhibited distinct features. A combination of text, audio, and video could be used to improve the early detection and differential diagnosis of apathy and depression in MCI patients.
本研究旨在开发一种分类模型,基于文本、音频和视频特征来检测和区分淡漠和抑郁,并利用 Shapely Additive Explanations (SHAP)工具包提高模型的可解释性。
对 319 名轻度认知障碍(MCI)患者进行主观量表和客观实验,以测量淡漠和抑郁。将 MCI 患者分为四组:仅抑郁、仅淡漠、抑郁-淡漠和正常组。使用开源数据分析工具包提取语音、面部和文本特征。使用多类分类和 SHAP 工具包开发分类模型并解释特定特征的贡献。
整体模型的宏平均 f1 分数和准确率分别为 0.91 和 0.90。淡漠、抑郁、抑郁-淡漠和正常组的准确率分别为 0.98、0.88、0.93 和 0.82。SHAP 工具包确定了与淡漠相关的语音特征(梅尔频率倒谱系数(MFCC)4、谱斜率、F0、F1)、面部特征(动作单元(AU)14、26、28、45)和文本特征(文本 6 语义)。同时,与抑郁相关的语音特征(谱斜率、闪烁、F0)和面部表情(AU 2、6、7、10、14、26、45)。除了上述共同特征外,还观察到抑郁-淡漠组的新语音(MFCC 2、响度)和面部(AU 9)特征。
淡漠和抑郁具有一些言语和面部特征,但也表现出明显的特征。文本、音频和视频的组合可用于提高 MCI 患者淡漠和抑郁的早期检测和鉴别诊断。