Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada.
J Med Internet Res. 2024 Sep 23;26:e58578. doi: 10.2196/58578.
Evaluation of artificial intelligence (AI) tools in clinical trials remains the gold standard for translation into clinical settings. However, design factors associated with successful trial completion and the common reasons for trial failure are unknown.
This study aims to compare trial design factors of complete and incomplete clinical trials testing AI tools. We conducted a case-control study of complete (n=485) and incomplete (n=51) clinical trials that evaluated AI as an intervention of ClinicalTrials.gov.
Trial design factors, including area of clinical application, intended use population, and intended role of AI, were extracted. Trials that did not evaluate AI as an intervention and active trials were excluded. The assessed trial design factors related to AI interventions included the domain of clinical application related to organ systems; intended use population for patients or health care providers; and the role of AI for different applications in patient-facing clinical workflows, such as diagnosis, screening, and treatment. In addition, we also assessed general trial design factors including study type, allocation, intervention model, masking, age, sex, funder, continent, length of time, sample size, number of enrollment sites, and study start year. The main outcome was the completion of the clinical trial. Odds ratio (OR) and 95% CI values were calculated for all trial design factors using propensity-matched, multivariable logistic regression.
We queried ClinicalTrials.gov on December 23, 2023, using AI keywords to identify complete and incomplete trials testing AI technologies as a primary intervention, yielding 485 complete and 51 incomplete trials for inclusion in this study. Our nested propensity-matched, case-control results suggest that trials conducted in Europe were significantly associated with trial completion when compared with North American trials (OR 2.85, 95% CI 1.14-7.10; P=.03), and the trial sample size was positively associated with trial completion (OR 1.00, 95% CI 1.00-1.00; P=.02).
Our case-control study is one of the first to identify trial design factors associated with completion of AI trials and catalog study-reported reasons for AI trial failure. We observed that trial design factors positively associated with trial completion include trials conducted in Europe and sample size. Given the promising clinical use of AI tools in health care, our results suggest that future translational research should prioritize addressing the design factors of AI clinical trials associated with trial incompletion and common reasons for study failure.
评估人工智能(AI)工具在临床试验中的应用仍然是将其转化为临床应用的金标准。然而,成功完成试验的设计因素以及试验失败的常见原因尚不清楚。
本研究旨在比较完全和不完全临床试验的试验设计因素,以评估 AI 工具。我们对 ClinicalTrials.gov 评估 AI 作为干预措施的完全(n=485)和不完全(n=51)临床试验进行了病例对照研究。
提取了临床试验设计因素,包括临床应用领域、预期使用人群和 AI 的预期作用。未评估 AI 作为干预措施的试验和正在进行的试验被排除在外。评估的与 AI 干预相关的试验设计因素包括与器官系统相关的临床应用领域;患者或医疗保健提供者的预期使用人群;以及 AI 在患者面向临床工作流程中的不同应用的作用,例如诊断、筛查和治疗。此外,我们还评估了一般临床试验设计因素,包括研究类型、分配、干预模型、掩蔽、年龄、性别、资助者、大陆、试验持续时间、样本量、入组地点数量和研究启动年份。主要结局是临床试验的完成情况。使用倾向评分匹配的多变量逻辑回归计算所有临床试验设计因素的比值比(OR)和 95%置信区间(CI)值。
我们于 2023 年 12 月 23 日在 ClinicalTrials.gov 上使用 AI 关键字查询,以确定作为主要干预措施的 AI 技术的完全和不完全试验,共纳入 485 项完全和 51 项不完全试验进行研究。我们嵌套的倾向评分匹配病例对照结果表明,与北美试验相比,在欧洲进行的试验与试验完成显著相关(OR 2.85,95%CI 1.14-7.10;P=.03),并且试验样本量与试验完成呈正相关(OR 1.00,95%CI 1.00-1.00;P=.02)。
我们的病例对照研究是首次确定与 AI 试验完成相关的试验设计因素并记录 AI 试验失败原因的研究之一。我们观察到与试验完成呈正相关的试验设计因素包括在欧洲进行的试验和样本量。鉴于 AI 工具在医疗保健中的临床应用前景广阔,我们的研究结果表明,未来的转化研究应优先解决与 AI 临床试验不完整和常见研究失败原因相关的设计因素。