Human Media Interaction group, University of Twente, Drienerlolaan 5, Enschede, 7522NB, Netherlands, 31 534893740.
Roessingh Research and Development, Enschede, Netherlands.
JMIR Form Res. 2024 Nov 28;8:e63262. doi: 10.2196/63262.
Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike.
The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health.
We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool's capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users' monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention.
Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants' (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants' response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants' mental health deteriorated during the intervention.
The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context.
人工智能 (AI) 工具通过提高医疗保健的可扩展性和可及性,为精神卫生保健带来了很大的希望。然而,目前 AI 工具的开发和评估实践限制了它们在医疗保健环境中的意义,因此也限制了这些工具对专业人员和客户的实际有用性。
本研究旨在展示对 AI 监测工具的评估,该工具用于检测基于网络的丧亲之痛干预中需要更深入护理的老年丧亲者,目的是朝着有意义的电子心理健康 AI 工具评估迈进。
我们利用了三种评估方法的见解:(1)F1 分数评估了该工具将用户监测参数分类为需要更多支持或推荐继续使用网络丧亲干预的能力;(2)我们使用线性回归来评估用户监测参数对 10 周干预过程中悲伤、抑郁和孤独的临床变化的预测价值;(3)我们从在 10 周干预期间将监测纳入每周电子邮件指导的 4 名电子教练 (N=4) 那里收集了定性经验数据。
基于 n=174 个二进制推荐决策,监测工具的 F1 得分为 0.91。由于抑郁和孤独评分在 10 周干预后变化极小,因此仅进行了一次线性回归。将干预前后的悲伤差异得分作为因变量。参与者 (N=21) 的自我报告监测的平均值和个体拟合增长曲线的估计斜率及其标准误差 (即,参与者对监测问题的反应模式) 被用作预测因子。只有监测评分的平均值对悲伤的观察到的变化具有预测价值 (R2=1.19,SE 0.33;t16=3.58,P=.002)。电子教练认为监测工具是确认他们对干预参与者的初步印象、个性化电子邮件指导以及在干预期间检测参与者心理健康恶化的机会。
本文评估的监测工具在非临床老年丧亲者样本中相当准确地识别出需要更深入支持的需求,对 10 周干预期间悲伤症状的变化具有一定的预测价值,并且受到支持丧亲者的电子教练的赞赏,他们在干预期间作为心理健康信息的额外来源。本文中的每种评估方法都有其自身的局限性,包括 (1) 基于现实生活健康数据的预测任务中的偏斜类分布,以及 (2) 根据并非针对评估 AI 工具的临床试验设计选择有意义的统计分析。然而,结合多种评估方法有助于对 AI 监测工具在其预期心理健康环境中的临床价值得出有意义的结论。