Harvard Medical School, Boston, Massachusetts.
Department of Medicine, Yale University, New Haven, Connecticut.
JAMA Netw Open. 2022 Sep 1;5(9):e2233946. doi: 10.1001/jamanetworkopen.2022.33946.
IMPORTANCE: Despite the potential of machine learning to improve multiple aspects of patient care, barriers to clinical adoption remain. Randomized clinical trials (RCTs) are often a prerequisite to large-scale clinical adoption of an intervention, and important questions remain regarding how machine learning interventions are being incorporated into clinical trials in health care. OBJECTIVE: To systematically examine the design, reporting standards, risk of bias, and inclusivity of RCTs for medical machine learning interventions. EVIDENCE REVIEW: In this systematic review, the Cochrane Library, Google Scholar, Ovid Embase, Ovid MEDLINE, PubMed, Scopus, and Web of Science Core Collection online databases were searched and citation chasing was done to find relevant articles published from the inception of each database to October 15, 2021. Search terms for machine learning, clinical decision-making, and RCTs were used. Exclusion criteria included implementation of a non-RCT design, absence of original data, and evaluation of nonclinical interventions. Data were extracted from published articles. Trial characteristics, including primary intervention, demographics, adherence to the CONSORT-AI reporting guideline, and Cochrane risk of bias were analyzed. FINDINGS: Literature search yielded 19 737 articles, of which 41 RCTs involved a median of 294 participants (range, 17-2488 participants). A total of 16 RCTS (39%) were published in 2021, 21 (51%) were conducted at single sites, and 15 (37%) involved endoscopy. No trials adhered to all CONSORT-AI standards. Common reasons for nonadherence were not assessing poor-quality or unavailable input data (38 trials [93%]), not analyzing performance errors (38 [93%]), and not including a statement regarding code or algorithm availability (37 [90%]). Overall risk of bias was high in 7 trials (17%). Of 11 trials (27%) that reported race and ethnicity data, the median proportion of participants from underrepresented minority groups was 21% (range, 0%-51%). CONCLUSIONS AND RELEVANCE: This systematic review found that despite the large number of medical machine learning-based algorithms in development, few RCTs for these technologies have been conducted. Among published RCTs, there was high variability in adherence to reporting standards and risk of bias and a lack of participants from underrepresented minority groups. These findings merit attention and should be considered in future RCT design and reporting.
重要性:尽管机器学习有可能改善患者护理的多个方面,但临床采用仍存在障碍。随机临床试验(RCT)通常是干预措施大规模临床采用的前提,但关于机器学习干预措施如何被纳入医疗保健临床试验仍存在重要问题。 目的:系统地检查医疗机器学习干预措施的 RCT 的设计、报告标准、偏倚风险和包容性。 证据审查:在这项系统评价中,检索了 Cochrane 图书馆、Google Scholar、Ovid Embase、Ovid MEDLINE、PubMed、Scopus 和 Web of Science Core Collection 在线数据库,并进行了引文追踪以找到从每个数据库成立到 2021 年 10 月 15 日发表的相关文章。使用了机器学习、临床决策和 RCT 的搜索词。排除标准包括实施非 RCT 设计、缺乏原始数据以及评估非临床干预措施。从已发表的文章中提取数据。分析了试验特征,包括主要干预措施、人口统计学、对 CONSORT-AI 报告指南的遵守情况和 Cochrane 偏倚风险。 发现:文献检索产生了 19737 篇文章,其中 41 项 RCT 涉及中位数为 294 名参与者(范围为 17-2488 名参与者)。共有 16 项 RCT(39%)于 2021 年发表,21 项(51%)在单一地点进行,15 项(37%)涉及内窥镜检查。没有试验完全遵守所有 CONSORT-AI 标准。不遵守的常见原因包括未评估质量差或不可用的输入数据(38 项试验[93%])、未分析性能错误(38 [93%])以及未包含关于代码或算法可用性的声明(37 [90%])。总体偏倚风险在 7 项试验(17%)中较高。在报告种族和族裔数据的 11 项试验(27%)中,参与者中代表性不足的少数群体的中位数比例为 21%(范围为 0%-51%)。 结论和相关性:这项系统评价发现,尽管有大量基于机器学习的医疗算法正在开发中,但针对这些技术的 RCT 数量却很少。在已发表的 RCT 中,报告标准和偏倚风险的遵守情况存在很大差异,代表性不足的少数群体参与者人数不足。这些发现值得关注,应在未来的 RCT 设计和报告中考虑。
Cochrane Database Syst Rev. 2022-2-1
Early Hum Dev. 2020-11
Cochrane Database Syst Rev. 2014-10-1
Cochrane Database Syst Rev. 2016-12-27
Healthcare (Basel). 2025-7-7
Front Psychiatry. 2025-6-13
NPJ Digit Med. 2025-5-6
NPJ Digit Med. 2025-5-5
Healthcare (Basel). 2025-3-22
Commun Med (Lond). 2021-8-23