Huh Ki Young, Song Ildae, Kim Yoonjin, Park Jiyeon, Ryu Hyunwook, Koh JaeEun, Yu Kyung-Sang, Kim Kyung Hwan, Lee SeungHwan
Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, Republic of Korea.
Department of Pharmaceutical Science and Technology, Kyungsung University, Busan, Republic of Korea.
Clin Transl Sci. 2025 Mar;18(3):e70183. doi: 10.1111/cts.70183.
Despite interest in clinical trials with decentralized elements (DCTs), analysis of their trends in trial registries is lacking due to heterogeneous designs and unstandardized terms. We explored Llama 3, an open-source large language model, to efficiently evaluate these trends. Trial data were sourced from Aggregate Analysis of ClinicalTrials.gov, focusing on drug trials conducted between 2018 and 2023. We utilized three Llama 3 models with a different number of parameters: 8b (model 1), fine-tuned 8b (model 2) with curated data, and 70b (model 3). Prompt engineering enabled sophisticated tasks such as classification of DCTs with explanations and extracting decentralized elements. Model performance, evaluated on a 3-month exploratory test dataset, demonstrated that sensitivity could be improved after fine-tuning from 0.0357 to 0.5385. Low positive predictive value in the fine-tuned model 2 could be improved by focusing on trials with DCT-associated expressions from 0.5385 to 0.9167. However, the extraction of decentralized elements was only properly performed by model 3, which had a larger number of parameters. Based on the results, we screened the entire 6-year dataset after applying DCT-associated expressions. After the subsequent application of models 2 and 3, we identified 692 DCTs. We found that a total of 213 trials were classified as phase 2, followed by 162 phase 4 trials, 112 phase 3 trials, and 92 phase 1 trials. In conclusion, our study demonstrated the potential of large language models for analyzing clinical trial information not structured in a machine-readable format. Managing potential biases during model application is crucial.
尽管人们对具有去中心化元素的临床试验(DCTs)感兴趣,但由于设计的异质性和术语的不标准化,缺乏对试验注册库中其趋势的分析。我们探索了开源大语言模型Llama 3,以有效评估这些趋势。试验数据来自ClinicalTrials.gov的汇总分析,重点关注2018年至2023年期间进行的药物试验。我们使用了三种具有不同参数数量的Llama 3模型:8b(模型1)、使用精选数据微调的8b(模型2)和70b(模型3)。提示工程实现了复杂的任务,如对DCTs进行带解释的分类以及提取去中心化元素。在一个为期3个月的探索性测试数据集上评估的模型性能表明,微调后敏感性可以从0.0357提高到0.5385。通过关注具有DCT相关表达的试验,微调后的模型2中较低的阳性预测值可以从0.5385提高到0.9167。然而,只有参数数量较多的模型3才能正确执行去中心化元素的提取。基于这些结果,我们在应用DCT相关表达后筛选了整个6年的数据集。在随后应用模型2和3之后,我们确定了692个DCTs。我们发现,共有213项试验被分类为2期,其次是162项4期试验、112项3期试验和92项1期试验。总之,我们的研究证明了大语言模型在分析非机器可读格式结构的临床试验信息方面的潜力。在模型应用过程中管理潜在偏差至关重要。