Wilhelm Christoph, Steckelberg Anke, Rebitschek Felix G
International Graduate Academy (InGrA), Institute of Health and Nursing Science, Medical Faculty, Martin Luther University Halle-Wittenberg, Magdeburger Str. 8, Halle (Saale) 06112, Germany.
Harding Center for Risk Literacy, Faculty of Health Sciences Brandenburg, University of Potsdam, Virchowstr. 2, Potsdam 14482, Germany.
Lancet Reg Health Eur. 2024 Dec 1;48:101145. doi: 10.1016/j.lanepe.2024.101145. eCollection 2025 Jan.
Despite notable advancements in artificial intelligence (AI) that enable complex systems to perform certain tasks more accurately than medical experts, the impact on patient-relevant outcomes remains uncertain. To address this gap, this systematic review assesses the benefits and harms associated with AI-related algorithmic decision-making (ADM) systems used by healthcare professionals, compared to standard care.
In accordance with the PRISMA guidelines, we included interventional and observational studies published as peer-reviewed full-text articles that met the following criteria: human patients; interventions involving algorithmic decision-making systems, developed with and/or utilizing machine learning (ML); and outcomes describing patient-relevant benefits and harms that directly affect health and quality of life, such as mortality and morbidity. Studies that did not undergo preregistration, lacked a standard-of-care control, or pertained to systems that assist in the execution of actions (e.g., in robotics) were excluded. We searched MEDLINE, EMBASE, IEEE Xplore, and Google Scholar for studies published in the past decade up to 31 March 2024. We assessed risk of bias using Cochrane's RoB 2 and ROBINS-I tools, and reporting transparency with CONSORT-AI and TRIPOD-AI. Two researchers independently managed the processes and resolved conflicts through discussion. This review has been registered with PROSPERO (CRD42023412156) and the study protocol has been published.
Out of 2,582 records identified after deduplication, 18 randomized controlled trials (RCTs) and one cohort study met the inclusion criteria, covering specialties such as psychiatry, oncology, and internal medicine. Collectively, the studies included a median of 243 patients (IQR 124-828), with a median of 50.5% female participants (range 12.5-79.0, IQR 43.6-53.6) across intervention and control groups. Four studies were classified as having low risk of bias, seven showed some concerns, and another seven were assessed as having high or serious risk of bias. Reporting transparency varied considerably: six studies showed high compliance, four moderate, and five low compliance with CONSORT-AI or TRIPOD-AI. Twelve studies (63%) reported patient-relevant benefits. Of those with low risk of bias, interventions reduced length of stay in hospital and intensive care unit (10.3 vs. 13.0 days, p = 0.042; 6.3 vs. 8.4 days, p = 0.030), in-hospital mortality (9.0% vs. 21.3%, p = 0.018), and depression symptoms in non-complex cases (45.1% vs. 52.3%, p = 0.03). However, harms were frequently underreported, with only eight studies (42%) documenting adverse events. No study reported an increase in adverse events as a result of the interventions.
The current evidence on AI-related ADM systems provides limited insights into patient-relevant outcomes. Our findings underscore the essential need for rigorous evaluations of clinical benefits, reinforced compliance with methodological standards, and balanced consideration of both benefits and harms to ensure meaningful integration into healthcare practice.
This study did not receive any funding.
尽管人工智能(AI)取得了显著进展,使复杂系统能够比医学专家更准确地执行某些任务,但其对患者相关结局的影响仍不确定。为填补这一空白,本系统评价评估了医疗保健专业人员使用的与AI相关的算法决策(ADM)系统相较于标准护理的益处和危害。
根据PRISMA指南,我们纳入了作为同行评审全文发表的干预性和观察性研究,这些研究符合以下标准:人类患者;涉及使用机器学习(ML)开发和/或利用机器学习的算法决策系统的干预措施;以及描述直接影响健康和生活质量的患者相关益处和危害的结局,如死亡率和发病率。未进行预注册、缺乏标准护理对照或涉及协助执行行动的系统(如机器人技术)的研究被排除。我们检索了MEDLINE、EMBASE、IEEE Xplore和谷歌学术,以查找截至2024年3月31日过去十年发表的研究。我们使用Cochrane的RoB 2和ROBINS - I工具评估偏倚风险,并使用CONSORT - AI和TRIPOD - AI评估报告透明度。两名研究人员独立管理流程并通过讨论解决冲突。本评价已在PROSPERO(CRD42023412156)注册,研究方案已发表。
在去除重复记录后识别出的2582条记录中,18项随机对照试验(RCT)和1项队列研究符合纳入标准,涵盖精神病学、肿瘤学和内科等专业。总体而言,这些研究纳入的患者中位数为243例(IQR 124 - 828),干预组和对照组女性参与者中位数为50.5%(范围12.5% - 79.0%,IQR 43.6% - 53.6%)。四项研究被归类为偏倚风险低,七项显示存在一些问题,另外七项被评估为具有高或严重偏倚风险。报告透明度差异很大:六项研究显示对CONSORT - AI或TRIPOD - AI的依从性高,四项中等,五项低。十二项研究(63%)报告了患者相关益处。在偏倚风险低的研究中,干预措施减少了住院时间和重症监护病房住院时间(10.3天对13.0天,p = 0.042;6.3天对8.4天,p = 0.030)、住院死亡率(9.0%对21.3%,p = 0.018)以及非复杂病例中的抑郁症状(45.1%对52.3%,p = 0.03)。然而,危害报告经常不足,只有八项研究(42%)记录了不良事件。没有研究报告干预导致不良事件增加。
目前关于与AI相关的ADM系统的证据对患者相关结局的见解有限。我们的研究结果强调了对临床益处进行严格评估、加强对方法学标准的遵守以及平衡考虑益处和危害的必要性,以确保在医疗实践中得到有意义的整合。
本研究未获得任何资金。