Vazquez Janette, Abdelrahman Samir, Byrne Loretta M, Russell Michael, Harris Paul, Facelli Julio C
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
Computer Science Department, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Egypt.
J Clin Transl Sci. 2020 Sep 4;5(1):e42. doi: 10.1017/cts.2020.535.
Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs.
We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations ('yes' or 'no'). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed.
The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105.
The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial.
不参与临床试验是评估新型药物和器械的主要障碍。在此,我们报告了对来自在线临床登记处ResearchMatch的数据集进行分析的结果,使用监督式机器学习方法和深度学习方法来发现更有可能对参与临床试验表现出兴趣的个体特征。
我们训练了六个监督式机器学习分类器(逻辑回归(LR)、决策树(DT)、高斯朴素贝叶斯(GNB)、K近邻分类器(KNC)、Adaboost分类器(ABC)和随机森林分类器(RFC)),以及一种深度学习方法——卷积神经网络(CNN),使用了一个包含841,377个实例和20个特征的数据集,这些特征包括人口统计学数据、地理限制、医疗状况和ResearchMatch访问历史。我们的结果变量包括在收到特定临床试验机会邀请时显示出特定参与者兴趣的回复(“是”或“否”)。此外,我们根据自我报告的主要医疗状况和性别从该数据集中创建了四个子集,并分别进行了分析。
深度学习模型的表现优于机器学习分类器,曲线下面积(AUC)达到0.8105。
结果显示有充分证据表明,在使用监督式机器学习分类器分析的数据集中,预测变量和结果变量之间存在有意义的相关性。这些方法在识别当有机会参与临床试验时更有可能参与的个体方面显示出了前景。