Hartnagel Lisa-Marie, Emden Daniel, Foo Jerome C, Streit Fabian, Witt Stephanie H, Frank Josef, Limberger Matthias F, Schmitz Sara E, Gilles Maria, Rietschel Marcella, Hahn Tim, Ebner-Priemer Ulrich W, Sirignano Lea
Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Hertzstr. 16, Building 06.31, Karlsruhe, 76187, Germany, 49 721 608 47543.
Medical Machine Learning Lab, Institute for Translational Psychiatry, University of Münster, Münster, Germany.
JMIR Ment Health. 2024 Dec 23;11:e64578. doi: 10.2196/64578.
Mobile devices for remote monitoring are inevitable tools to support treatment and patient care, especially in recurrent diseases such as major depressive disorder. The aim of this study was to learn if machine learning (ML) models based on longitudinal speech data are helpful in predicting momentary depression severity. Data analyses were based on a dataset including 30 inpatients during an acute depressive episode receiving sleep deprivation therapy in stationary care, an intervention inducing a rapid change in depressive symptoms in a relatively short period of time. Using an ambulatory assessment approach, we captured speech samples and assessed concomitant depression severity via self-report questionnaire over the course of 3 weeks (before, during, and after therapy). We extracted 89 speech features from the speech samples using the Extended Geneva Minimalistic Acoustic Parameter Set from the Open-Source Speech and Music Interpretation by Large-Space Extraction (audEERING) toolkit and the additional parameter speech rate.
We aimed to understand if a multiparameter ML approach would significantly improve the prediction compared to previous statistical analyses, and, in addition, which mechanism for splitting training and test data was most successful, especially focusing on the idea of personalized prediction.
To do so, we trained and evaluated a set of >500 ML pipelines including random forest, linear regression, support vector regression, and Extreme Gradient Boosting regression models and tested them on 5 different train-test split scenarios: a group 5-fold nested cross-validation at the subject level, a leave-one-subject-out approach, a chronological split, an odd-even split, and a random split.
In the 5-fold cross-validation, the leave-one-subject-out, and the chronological split approaches, none of the models were statistically different from random chance. The other two approaches produced significant results for at least one of the models tested, with similar performance. In total, the superior model was an Extreme Gradient Boosting in the odd-even split approach (R²=0.339, mean absolute error=0.38; both P<.001), indicating that 33.9% of the variance in depression severity could be predicted by the speech features.
Overall, our analyses highlight that ML fails to predict depression scores of unseen patients, but prediction performance increased strongly compared to our previous analyses with multilevel models. We conclude that future personalized ML models might improve prediction performance even more, leading to better patient management and care.
用于远程监测的移动设备是支持治疗和患者护理的必要工具,尤其是对于复发性疾病,如重度抑郁症。本研究的目的是了解基于纵向语音数据的机器学习(ML)模型是否有助于预测瞬时抑郁严重程度。数据分析基于一个数据集,该数据集包括30名在急性抑郁发作期间接受住院睡眠剥夺治疗的患者,睡眠剥夺治疗是一种在相对较短时间内可导致抑郁症状迅速变化的干预措施。我们采用动态评估方法,在3周时间内(治疗前、治疗期间和治疗后)采集语音样本,并通过自我报告问卷评估同时期的抑郁严重程度。我们使用开源语音和音乐解释大空间提取(audEERING)工具包中的扩展日内瓦简约声学参数集以及额外参数语速,从语音样本中提取了89个语音特征。
我们旨在了解与之前的统计分析相比,多参数ML方法是否能显著提高预测效果,此外,哪种划分训练和测试数据的机制最为成功,尤其关注个性化预测的理念。
为此,我们训练并评估了一组超过500个ML管道,包括随机森林、线性回归、支持向量回归和极端梯度提升回归模型,并在5种不同的训练-测试分割场景下对它们进行测试:受试者水平的5折嵌套交叉验证、留一受试者法、按时间顺序分割、奇偶分割和随机分割。
在5折交叉验证、留一受试者法和按时间顺序分割方法中,没有一个模型在统计学上与随机猜测有差异。其他两种方法对至少一个测试模型产生了显著结果,且性能相似。总体而言,在奇偶分割方法中表现最优的模型是极端梯度提升模型(R² = 0.339,平均绝对误差 = 0.38;P均 <.001),这表明语音特征可预测抑郁严重程度中33.9%的方差。
总体而言,我们的分析表明ML无法预测未见过的患者的抑郁评分,但与我们之前使用多层模型的分析相比,预测性能有了显著提高。我们得出结论,未来的个性化ML模型可能会进一步提高预测性能,从而实现更好的患者管理和护理。