Ma Sisi, Wang Yinzhao, Wagner John, Johnson Steve, Pakhomov Serguei, Aliferis Constantin
Institute for Health Informatics, University of Minnesota, Minneapolis, MN, 55455, USA.
Medical School, University of Minnesota, Minneapolis, MN, 55455, USA.
Sci Rep. 2025 Jan 31;15(1):3879. doi: 10.1038/s41598-025-88400-x.
Accrual success is one key determining factor for the success of clinical trials. Global data analyses of all terminated trials reported that 55% of trials were terminated due to low accrual rates. Failure to meet accrual goals have a significant impact on costs for sponsors, academic institutions, investigators, and society at large. The ability to predict trial accrual success with high precision before the trial starts would be highly valuable, preventing the allocation of critical resources for trials unlikely to meet accrual goals. In the present study, we constructed a dataset for predicting clinical trial failure based on poor accrual using clinicaltrial.gov data containing information on 57,846 trials. Features of the dataset were informed by prior literature and constructed using data-driven natural language processing methods. We built predictive models for accrual failure using state-of-the-art supervised machine learning protocols and methods. Models resulted in good predictive performance that was stable over a 10-year time period, with predictive performance of cross-validation AUC = 0.744 (+/-0.018) and prospective validation AUC = 0.737 (+/-0.038). We also improved model calibration and examined model performance with the reject option. These modifications enable model translation into decision support tools for various real-world settings. To the best of our knowledge, this is the first study to develop models for predicting clinical trial failure due to accrual based on a large dataset with a comprehensive set of trial features.
入组成功是临床试验成功的一个关键决定因素。对所有已终止试验的全球数据分析表明,55%的试验因入组率低而终止。未能达到入组目标会对申办方、学术机构、研究者以及整个社会的成本产生重大影响。在试验开始前高精度预测试验入组成功的能力将非常有价值,可避免为不太可能达到入组目标的试验分配关键资源。在本研究中,我们使用clinicaltrial.gov上包含57,846项试验信息的数据构建了一个基于入组不佳预测临床试验失败的数据集。该数据集的特征参考了先前的文献,并使用数据驱动的自然语言处理方法构建。我们使用最先进的监督式机器学习协议和方法建立了入组失败的预测模型。模型具有良好的预测性能,在10年期间保持稳定,交叉验证AUC = 0.744(±0.018),前瞻性验证AUC = 0.737(±0.038)。我们还改进了模型校准,并使用拒绝选项检查了模型性能。这些改进使模型能够转化为适用于各种实际场景的决策支持工具。据我们所知,这是第一项基于包含全面试验特征的大型数据集开发因入组导致临床试验失败预测模型的研究。