Ciharova Marketa, Amarti Khadicha, van Breda Ward, Gevonden Martin J, Ghassemi Sina, Kleiboer Annet, Vinkers Christiaan H, Sep Milou S C, Trofimova Sophia, Cooper Alexander C, Peng Xianhua, Schulte Mieke, Karyotaki Eirini, Cuijpers Pim, Riper Heleen
Department of Clinical, Neuro- and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, Netherlands.
Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands.
Front Psychiatry. 2025 Jun 13;16:1548287. doi: 10.3389/fpsyt.2025.1548287. eCollection 2025.
BACKGROUND: Early detection of elevated acute stress is necessary if we aim to reduce consequences associated with prolonged or recurrent stress exposure. Stress monitoring may be supported by valid and reliable machine-learning algorithms. However, investigation of algorithms detecting stress severity on a continuous scale is missing due to high demands on data quality for such analyses. Use of multimodal data, meaning data coming from multiple sources, might contribute to machine-learning stress severity detection. We aimed to detect laboratory-induced stress using multimodal data and identify challenges researchers may encounter when conducting a similar study. METHODS: We conducted a preliminary exploration of performance of a machine-learning algorithm trained on multimodal data, namely visual, acoustic, verbal, and physiological features, in its ability to detect stress severity following a partially automated online version of the Trier Social Stress Test. College students ( = 42; age = 20.79, 69% female) completed a self-reported stress visual analogue scale at five time-points: After the initial resting period (P1), during the three stress-inducing tasks (i.e., preparation for a presentation, a presentation task, and an arithmetic task, P2-4) and after a recovery period (P5). For the whole duration of the experiment, we recorded the participants' voice and facial expressions by a video camera and measured cardiovascular and electrodermal physiology by an ambulatory monitoring system. Then, we evaluated the performance of the algorithm in detection of stress severity using 3 combinations of visual, acoustic, verbal, and physiological data collected at each of the periods of the experiment (P1-5). RESULTS: Participants reported minimal (P1, = 21.79, = 17.45) to moderate stress severity (P2, = 47.95, = 15.92), depending on the period at hand. We found a very weak association between the detected and observed scores ( = .154; = .021). In our analysis, we classified participants into categories of stressed and non-stressed individuals. When applying all available features (i.e., visual, acoustic, verbal, and physiological), or a combination of visual, acoustic and verbal features, performance ranged from acceptable to good, but only for the presentation task (accuracy up to.71, F1-score up to.73). CONCLUSIONS: The complexity of input features needed for machine-learning detection of stress severity based on multimodal data requires large sample sizes with wide variability of stress reactions and inputs among participants. These are difficult to recruit for laboratory setting, due to high time and effort demands on the side of both researcher and participant. Resources needed may be decreased using automatization of experimental procedures, which may, however, lead to additional technological challenges, potentially causing other recruitment setbacks. Further investigation is necessary, with the emphasis on quality ground truth, i.e., gold standard (self-report) instruments, but also outside of laboratory experiments, mainly in general populations and mental health care patients.
背景:如果我们旨在减少与长期或反复应激暴露相关的后果,那么早期发现急性应激升高是必要的。有效的和可靠的机器学习算法可能有助于应激监测。然而,由于此类分析对数据质量要求很高,目前缺少对能够在连续尺度上检测应激严重程度的算法的研究。使用多模态数据,即来自多个来源的数据,可能有助于机器学习对应激严重程度的检测。我们旨在使用多模态数据检测实验室诱导的应激,并确定研究人员在进行类似研究时可能遇到的挑战。 方法:我们对一种基于多模态数据(即视觉、听觉、言语和生理特征)训练的机器学习算法在检测应激严重程度方面的性能进行了初步探索,该算法用于检测在部分自动化的在线版特里尔社会应激测试后的应激严重程度。大学生(n = 42;年龄 = 20.79岁,69%为女性)在五个时间点完成了一份自我报告的应激视觉模拟量表:初始休息期后(P1)、在三项应激诱导任务期间(即准备演讲、演讲任务和算术任务,P2 - 4)以及恢复期后(P5)。在整个实验过程中,我们用摄像机记录了参与者的声音和面部表情,并通过动态监测系统测量了心血管和皮肤电生理指标。然后,我们使用在实验的每个阶段(P1 - 5)收集的视觉、听觉、言语和生理数据的3种组合来评估该算法在检测应激严重程度方面的性能。 结果:根据所处阶段不同,参与者报告的应激严重程度从最小(P1,M = 21.79,SD = 17.45)到中等(P2,M = 47.95,SD = 15.92)。我们发现检测分数与观察分数之间的关联非常弱(r = 0.154;p = 0.021)。在我们的分析中,我们将参与者分为应激个体和非应激个体类别。当应用所有可用特征(即视觉、听觉、言语和生理特征),或视觉、听觉和言语特征的组合时,性能范围从可接受到良好,但仅适用于演讲任务(准确率高达0.71,F1分数高达0.73)。 结论:基于多模态数据的机器学习检测应激严重程度所需的输入特征很复杂,这需要大量样本,且参与者之间的应激反应和输入要有广泛的变异性。由于研究人员和参与者都需要投入大量时间和精力,在实验室环境中很难招募到这样的样本。使用实验程序自动化可能会减少所需资源,但这可能会带来额外的技术挑战,潜在地导致其他招募方面的挫折。有必要进行进一步的研究,重点是高质量的基本事实,即金标准(自我报告)工具,同时也要在实验室实验之外进行研究,主要针对普通人群和精神卫生保健患者。
Cochrane Database Syst Rev. 2020-10-19
Cochrane Database Syst Rev. 2022-1-17
Cochrane Database Syst Rev. 2022-5-20
Cochrane Database Syst Rev. 2023-2-8
Cochrane Database Syst Rev. 2021-9-14
Cochrane Database Syst Rev. 2020-1-9
Cochrane Database Syst Rev. 2025-1-29
Glob Cardiol Sci Pract. 2024-8-1
Sensors (Basel). 2024-5-18
JMIR Mhealth Uhealth. 2024-5-23
Biosens Bioelectron. 2024-7-15
J Med Internet Res. 2023-10-4
IEEE J Biomed Health Inform. 2023-5