Luo Xuexing, Li Yiyuan, Xu Jing, Zheng Zhong, Ying Fangtian, Huang Guanghui
Faculty of Humanities and Arts, Macau University of Science and Technology, Macau, China.
Industrial and Manufacturing Engineering, European Academy of Engineering, Gothenburg, Sweden.
J Med Internet Res. 2025 Jun 23;27:e72398. doi: 10.2196/72398.
This systematic review aimed to explore the current applications, potential benefits, and issues of artificial intelligence (AI) in medical questionnaires, focusing on its role in 3 main functions: assessment, development, and prediction. The global mental health burden remains severe. The World Health Organization reports that >1 billion people worldwide experience mental disorders, with the prevalence of depression and anxiety among children and adolescents at 2.6% and 6.5%, respectively. However, commonly used clinical questionnaires such as the Hamilton Depression Rating Scale and the Beck Depression Inventory suffer from several problems, including the high degree of overlap of symptoms of depression with those of other psychiatric disorders and a lack of professional supervision during administration of the questionnaires, which often lead to inaccurate diagnoses. In the wake of the COVID-19 pandemic, the health care system is facing the dual challenges of a surge in patient numbers and the complexity of mental health issues. AI technology has now been shown to have great promise in improving diagnostic accuracy, assisting clinical decision-making, and simplifying questionnaire development and data analysis. To systematically assess the value of AI in medical questionnaires, this study searched 5 databases (PubMed, Embase, Cochrane Library, Web of Science, and China National Knowledge Infrastructure) for the period from database inception to September 2024. Of 49,091 publications, a total of 14 (0.03%) studies met the inclusion criteria. AI technologies showed significant advantages in assessment, such as distinguishing myalgic encephalomyelitis or chronic fatigue syndrome from long COVID-19 with 92.18% accuracy. In questionnaire development, natural language processing using generative models such as ChatGPT was used to construct culturally competent scales. In terms of disease prediction, one study had an area under the curve of 0.790 for cataract surgery risk prediction. Overall, 24 AI technologies were identified, covering traditional algorithms such as random forest, support vector machine, and k-nearest neighbor, as well as deep learning models such as convolutional neural networks, Bidirectional Encoder Representations From Transformers, and ChatGPT. Despite the positive findings, only 21% (3/14) of the studies had entered the clinical validation phase, whereas the remaining 79% (11/14) were still in the exploratory phase of research. Most of the studies (10/14, 71%) were rated as being of moderate methodological quality, with major limitations including lack of a control group, incomplete follow-up data, and inadequate validation systems. In summary, the integrated application of AI in medical questionnaires has significant potential to improve diagnostic efficiency, accelerate scale development, and promote early intervention. Future research should pay more attention to model interpretability, system compatibility, validation standardization, and ethical governance to effectively address key challenges such as data privacy, clinical integration, and transparency.
本系统评价旨在探讨人工智能(AI)在医学问卷中的当前应用、潜在益处和问题,重点关注其在评估、开发和预测这三个主要功能中的作用。全球心理健康负担仍然严重。世界卫生组织报告称,全球有超过10亿人患有精神障碍,儿童和青少年中抑郁症和焦虑症的患病率分别为2.6%和6.5%。然而,常用的临床问卷,如汉密尔顿抑郁量表和贝克抑郁量表,存在几个问题,包括抑郁症症状与其他精神疾病症状高度重叠,以及问卷施测过程中缺乏专业监督,这往往导致诊断不准确。在新冠疫情之后,医疗保健系统面临着患者数量激增和心理健康问题复杂性的双重挑战。现已证明,人工智能技术在提高诊断准确性、协助临床决策以及简化问卷开发和数据分析方面具有巨大潜力。为了系统评估人工智能在医学问卷中的价值,本研究检索了5个数据库(PubMed、Embase、Cochrane图书馆、科学引文索引和中国知网),检索时间段为各数据库建库至2024年9月。在49091篇出版物中,共有14项研究(0.03%)符合纳入标准。人工智能技术在评估方面显示出显著优势,例如以92.18%的准确率区分肌痛性脑脊髓炎或慢性疲劳综合征与新冠后综合征。在问卷开发方面,使用ChatGPT等生成模型的自然语言处理被用于构建具有文化适应性的量表。在疾病预测方面,一项研究对白内障手术风险预测的曲线下面积为0.790。总体而言,共识别出24种人工智能技术,涵盖随机森林、支持向量机和k近邻等传统算法,以及卷积神经网络、基于变换器的双向编码器表征和ChatGPT等深度学习模型。尽管有这些积极的发现,但只有21%(3/14)的研究进入了临床验证阶段,而其余79%(11/14)仍处于探索性研究阶段。大多数研究(10/14,71%)被评为方法学质量中等,主要局限性包括缺乏对照组、随访数据不完整和验证系统不完善。总之,人工智能在医学问卷中的综合应用具有提高诊断效率、加速量表开发和促进早期干预的巨大潜力。未来的研究应更加关注模型的可解释性、系统兼容性、验证标准化和伦理治理,以有效应对数据隐私、临床整合和透明度等关键挑战。
J Med Internet Res. 2025-6-23
Cochrane Database Syst Rev. 2022-5-20
Front Public Health. 2025-7-16
Health Technol Assess. 2001
Cochrane Database Syst Rev. 2022-11-17
Cochrane Database Syst Rev. 2024-6-4
Cochrane Database Syst Rev. 2014-11-11
JBI Database System Rev Implement Rep. 2015-8-14
J Med Internet Res. 2025-8-11
Healthcare (Basel). 2025-3-10
J Affect Disord. 2025-2-15
Lancet Psychiatry. 2024-12
PLOS Digit Health. 2024-11-7
Interact J Med Res. 2024-11-4