Chen Donghao, Wang Pengfei, Zhang Xiaolong, Qiao Runqi, Li Nanxi, Zhang Xiaodong, Zhang Honggang, Wang Gang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.
Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, China.
JMIR Form Res. 2025 May 30;9:e56057. doi: 10.2196/56057.
Conventional approaches for major depressive disorder (MDD) screening rely on two effective but subjective paradigms: self-rated scales and clinical interviews. Artificial intelligence (AI) can potentially contribute to psychiatry, especially through the use of objective data such as objective audiovisual signals.
This study aimed to evaluate the efficacy of different paradigms using AI analysis on audiovisual signals.
We recruited 89 participants (mean age, 37.1 years; male: 30/89, 33.7%; female: 59/89, 66.3%), including 41 patients with MDD and 48 asymptomatic participants. We developed AI models using facial movement, acoustic, and text features extracted from videos obtained via a tool, incorporating four paradigms: conventional scale (CS), question and answering (Q&A), mental imagery description (MID), and video watching (VW). Ablation experiments and 5-fold cross-validation were performed using two AI methods to ascertain the efficacy of paradigm combinations. Attention scores from the deep learning model were calculated and compared with correlation results to assess comprehensibility.
In video clip-based analyses, Q&A outperformed MID with a mean binary sensitivity of 79.06% (95%CI 77.06%-83.35%; P=.03) and an effect size of 1.0. Among individuals, the combination of Q&A and MID outperformed MID alone with a mean extent accuracy of 80.00% (95%CI 65.88%-88.24%; P= .01), with an effect size 0.61. The mean binary accuracy exceeded 76.25% for video clip predictions and 74.12% for individual-level predictions across the two AI methods, with top individual binary accuracy of 94.12%. The features exhibiting high attention scores demonstrated a significant overlap with those that were statistically correlated, including 18 features (all Ps<.05), while also aligning with established nonverbal markers.
The Q&A paradigm demonstrated higher efficacy than MID, both individually and in combination. Using AI to analyze audiovisual signals across multiple paradigms has the potential to be an effective tool for MDD screening.
重度抑郁症(MDD)筛查的传统方法依赖于两种有效但主观的范式:自评量表和临床访谈。人工智能(AI)可能对精神病学有帮助,特别是通过使用客观数据,如客观视听信号。
本研究旨在评估使用AI分析视听信号的不同范式的疗效。
我们招募了89名参与者(平均年龄37.1岁;男性:30/89,33.7%;女性:59/89,66.3%),包括41名MDD患者和48名无症状参与者。我们使用从通过一种工具获得的视频中提取的面部运动、声学和文本特征开发了AI模型,纳入了四种范式:传统量表(CS)、问答(Q&A)、心理意象描述(MID)和视频观看(VW)。使用两种AI方法进行了消融实验和五折交叉验证,以确定范式组合的疗效。计算深度学习模型的注意力分数,并与相关结果进行比较,以评估可理解性。
在基于视频片段的分析中,问答范式的表现优于心理意象描述范式,平均二元敏感性为79.06%(95%CI 77.06%-83.35%;P = 0.03),效应大小为1.0。在个体中,问答和心理意象描述的组合表现优于单独的心理意象描述,平均范围准确率为80.00%(95%CI 65.88%-88.24%;P = 0.01),效应大小为0.61。在两种AI方法中,视频片段预测的平均二元准确率超过76.25%,个体水平预测的平均二元准确率超过74.12%,个体最高二元准确率为94.12%。表现出高注意力分数的特征与那些具有统计学相关性的特征有显著重叠,包括18个特征(所有P<0.05),同时也与既定的非语言标记一致。
问答范式在单独使用和联合使用时均表现出比心理意象描述范式更高的疗效。使用AI跨多种范式分析视听信号有可能成为MDD筛查的有效工具。