Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, United States of America.
PLoS One. 2021 Jul 12;16(7):e0253789. doi: 10.1371/journal.pone.0253789. eCollection 2021.
As of March 30 2021, over 5,193 COVID-19 clinical trials have been registered through Clinicaltrial.gov. Among them, 191 trials were terminated, suspended, or withdrawn (indicating the cessation of the study). On the other hand, 909 trials have been completed (indicating the completion of the study). In this study, we propose to study underlying factors of COVID-19 trial completion vs. cessation, and design predictive models to accurately predict whether a COVID-19 trial may complete or cease in the future. We collect 4,441 COVID-19 trials from ClinicalTrial.gov to build a testbed, and design four types of features to characterize clinical trial administration, eligibility, study information, criteria, drug types, study keywords, as well as embedding features commonly used in the state-of-the-art machine learning. Our study shows that drug features and study keywords are most informative features, but all four types of features are essential for accurate trial prediction. By using predictive models, our approach achieves more than 0.87 AUC (Area Under the Curve) score and 0.81 balanced accuracy to correctly predict COVID-19 clinical trial completion vs. cessation. Our research shows that computational methods can deliver effective features to understand difference between completed vs. ceased COVID-19 trials. In addition, such models can also predict COVID-19 trial status with satisfactory accuracy, and help stakeholders better plan trials and minimize costs.
截至 2021 年 3 月 30 日,已通过 Clinicaltrial.gov 注册了超过 5193 项 COVID-19 临床试验。其中,有 191 项试验已经终止、暂停或撤回(表明研究已停止)。另一方面,有 909 项试验已经完成(表明研究已完成)。在这项研究中,我们提出研究 COVID-19 试验完成与停止的潜在因素,并设计预测模型来准确预测 COVID-19 试验未来是否会完成或停止。我们从 ClinicalTrial.gov 收集了 4441 项 COVID-19 试验来构建一个测试平台,并设计了四种类型的特征来描述临床试验管理、资格、研究信息、标准、药物类型、研究关键字以及在最先进的机器学习中常用的嵌入特征。我们的研究表明,药物特征和研究关键字是最具信息量的特征,但所有四种类型的特征对于准确的试验预测都是必不可少的。通过使用预测模型,我们的方法在正确预测 COVID-19 临床试验完成与停止方面的 AUC(曲线下面积)得分超过 0.87,平衡准确率超过 0.81。我们的研究表明,计算方法可以提供有效的特征来理解已完成与已停止的 COVID-19 试验之间的差异。此外,这些模型还可以以令人满意的准确性预测 COVID-19 试验的状态,并帮助利益相关者更好地规划试验并最大程度地降低成本。