一种使用机器学习技术进行临床试验设计和加速患者入组流程的新型模型。

A Novel Model Using ML Techniques for Clinical Trial Design and Expedited Patient Onboarding Process.

作者信息

Iyer Abhirvey, Narayanaswami Sundaravalli

机构信息

Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (BHU) Varanasi, Varanasi, Uttar Pradesh, India.

Public Systems Group, Indian Institute of Management Ahmedabad, Ahmedabad, Gujarat, India.

出版信息

Clinicoecon Outcomes Res. 2025 Jan 16;17:1-18. doi: 10.2147/CEOR.S479603. eCollection 2025.

DOI:10.2147/CEOR.S479603

PMID:39839913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11745069/

Abstract

INTRODUCTION

Clinical trials are critical for drug development and patient care; however, they often need more efficient trial design and patient enrolment processes. This research explores integrating machine learning (ML) techniques to address these challenges. Specifically, the study investigates ML models for two critical aspects: (1) streamlining clinical trial design parameters (like the site of drug action, type of Interventional/Observational model, etc) and (2) optimizing patient/volunteer enrolment for trials through efficient classification techniques.

METHODS

The study utilized two datasets: the first, with 55,000 samples (from ClinicalTrials.gov), was divided into five subsets (10,000-15,000 rows each) for model evaluation, focusing on trial parameter optimization. The second dataset targeted patient eligibility classification (from the UCI ML Repository). Five ML models-XGBoost, Random Forest, Support Vector Classifier (SVC), Logistic Regression, and Decision Tree-were applied to both datasets, alongside Artificial Neural Networks (ANN) for the second dataset. Model performance was evaluated using precision, recall, balanced accuracy, ROC-AUC, and weighted F1-score, with results averaged across k-fold cross-validation.

RESULTS

In the first phase, XGBoost and Random Forest emerged as the best-performing models across all five subsets, achieving an average balanced accuracy of 0.71 and an average ROC-AUC of 0.7. The second dataset analysis revealed that while SVC and ANN performed well, ANN was preferred for its scalability to larger datasets. ANN achieved a test accuracy of 0.73714, demonstrating its potential for real-world implementation in patient streamlining.

DISCUSSION

The study highlights the effectiveness of ML models in improving clinical trial workflows. XGBoost and Random Forest demonstrated robust performance for large clinical datasets in optimizing trial parameters, while ANN proved advantageous for patient eligibility classification due to its scalability. These findings underscore the potential of ML to enhance decision-making, reduce delays, and improve the accuracy of clinical trial outcomes. As ML technology continues to evolve, its integration into clinical research could drive innovation and improve patient care.

摘要

引言

临床试验对于药物研发和患者护理至关重要；然而，它们往往需要更高效的试验设计和患者招募流程。本研究探索整合机器学习（ML）技术来应对这些挑战。具体而言，该研究针对两个关键方面调查ML模型：（1）简化临床试验设计参数（如药物作用部位、介入/观察模型类型等），以及（2）通过高效分类技术优化试验的患者/志愿者招募。

方法

该研究使用了两个数据集：第一个数据集有55000个样本（来自ClinicalTrials.gov），被分为五个子集（每个子集10000 - 15000行）用于模型评估，重点是试验参数优化。第二个数据集用于患者资格分类（来自UCI机器学习库）。五个ML模型——XGBoost、随机森林、支持向量分类器（SVC）、逻辑回归和决策树——被应用于这两个数据集，第二个数据集还使用了人工神经网络（ANN）。使用精确率、召回率、平衡准确率、ROC-AUC和加权F1分数评估模型性能，结果是k折交叉验证的平均值。

结果

在第一阶段，XGBoost和随机森林在所有五个子集中表现最佳，平均平衡准确率达到0.71，平均ROC-AUC为0.7。对第二个数据集的分析表明，虽然SVC和ANN表现良好，但由于ANN对更大数据集的可扩展性，它更受青睐。ANN的测试准确率达到0.73714，证明了其在患者精简方面在实际应用中的潜力。

讨论

该研究突出了ML模型在改善临床试验工作流程方面的有效性。XGBoost和随机森林在优化试验参数方面对大型临床数据集表现出强大性能，而ANN因其可扩展性在患者资格分类方面被证明具有优势。这些发现强调了ML在增强决策、减少延迟和提高临床试验结果准确性方面的潜力。随着ML技术不断发展，将其整合到临床研究中可以推动创新并改善患者护理。