基于大语言模型的临床试验设计资格标准聚类分析。

Analysis of eligibility criteria clusters based on large language models for clinical trial design.

作者信息

Bornet Alban, Khlebnikov Philipp, Meer Florian, Haas Quentin, Yazdani Anthony, Zhang Boya, Amini Poorya, Teodoro Douglas

机构信息

Department of Radiology and Medical Informatics, University of Geneva, 1202 Geneva, Switzerland.

Risklick AG, 3013 Bern, Switzerland.

出版信息

J Am Med Inform Assoc. 2025 Mar 1;32(3):447-458. doi: 10.1093/jamia/ocae311.

DOI:10.1093/jamia/ocae311

PMID:39724913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11833473/

Abstract

OBJECTIVES

Clinical trials (CTs) are essential for improving patient care by evaluating new treatments' safety and efficacy. A key component in CT protocols is the study population defined by the eligibility criteria. This study aims to evaluate the effectiveness of large language models (LLMs) in encoding eligibility criterion information to support CT-protocol design.

MATERIALS AND METHODS

We extracted eligibility criterion sections, phases, conditions, and interventions from CT protocols available in the ClinicalTrials.gov registry. Eligibility sections were split into individual rules using a criterion tokenizer and embedded using LLMs. The obtained representations were clustered. The quality and relevance of the clusters for protocol design was evaluated through 3 experiments: intrinsic alignment with protocol information and human expert cluster coherence assessment, extrinsic evaluation through CT-level classification tasks, and eligibility section generation.

RESULTS

Sentence embeddings fine-tuned using biomedical corpora produce clusters with the highest alignment to CT-level information. Human expert evaluation confirms that clusters are well structured and coherent. Despite the high information compression, clusters retain significant CT information, up to 97% of the classification performance obtained with raw embeddings. Finally, eligibility sections automatically generated using clusters achieve 95% of the ROUGE scores obtained with a generative LLM prompted with CT-protocol details, suggesting that clusters encapsulate information useful to CT-protocol design.

DISCUSSION

Clusters derived from sentence-level LLM embeddings effectively summarize complex eligibility criterion data while retaining relevant CT-protocol details. Clustering-based approaches provide a scalable enhancement in CT design that balances information compression with accuracy.

CONCLUSIONS

Clustering eligibility criteria using LLM embeddings provides a practical and efficient method to summarize critical protocol information. We provide an interactive visualization of the pipeline here.

摘要

目的

临床试验对于通过评估新治疗方法的安全性和有效性来改善患者护理至关重要。临床试验方案的一个关键组成部分是由纳入标准定义的研究人群。本研究旨在评估大语言模型（LLMs）在编码纳入标准信息以支持临床试验方案设计方面的有效性。

材料和方法

我们从ClinicalTrials.gov注册库中可用的临床试验方案中提取纳入标准部分、阶段、条件和干预措施。使用标准分词器将纳入标准部分拆分为单个规则，并使用大语言模型进行嵌入。对获得的表示进行聚类。通过3个实验评估聚类对于方案设计的质量和相关性：与方案信息的内在一致性以及人类专家聚类连贯性评估、通过临床试验级分类任务进行的外在评估以及纳入标准部分生成。

结果

使用生物医学语料库微调的句子嵌入产生与临床试验级信息一致性最高的聚类。人类专家评估证实聚类结构良好且连贯。尽管信息压缩程度高，但聚类保留了重要的临床试验信息，高达原始嵌入获得的分类性能的97%。最后，使用聚类自动生成的纳入标准部分达到了使用临床试验方案细节提示的生成式大语言模型获得的ROUGE分数的95%，这表明聚类封装了对临床试验方案设计有用的信息。

讨论

从句子级大语言模型嵌入中得出的聚类有效地总结了复杂的纳入标准数据，同时保留了相关的临床试验方案细节。基于聚类的方法在临床试验设计中提供了一种可扩展的增强，在信息压缩和准确性之间取得了平衡。

结论

使用大语言模型嵌入对纳入标准进行聚类提供了一种实用且高效的方法来总结关键的方案信息。我们在此提供了该流程的交互式可视化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01ae/11833473/5ea8d0d879c5/ocae311f1.jpg

相似文献

Analysis of eligibility criteria clusters based on large language models for clinical trial design.基于大语言模型的临床试验设计资格标准聚类分析。

J Am Med Inform Assoc. 2025 Mar 1;32(3):447-458. doi: 10.1093/jamia/ocae311.

A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试，采用了适配的大语言模型。

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.使用大语言模型嵌入对精神障碍相关自发交流进行算法分类：算法开发与验证

JMIR AI. 2025 May 30;4:e67369. doi: 10.2196/67369.

Individual-level interventions to reduce personal exposure to outdoor air pollution and their effects on people with long-term respiratory conditions.个体层面的干预措施以减少个人接触室外空气污染及其对长期呼吸系统疾病患者的影响。

Cochrane Database Syst Rev. 2021 Aug 9;8(8):CD013441. doi: 10.1002/14651858.CD013441.pub2.

Monitoring strategies for clinical intervention studies.临床干预研究的监测策略。

Cochrane Database Syst Rev. 2021 Dec 8;12(12):MR000051. doi: 10.1002/14651858.MR000051.pub2.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测：基于放射学报告的多中心方法学研究

J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

Prognosis of adults and children following a first unprovoked seizure.首次无诱因发作后成人和儿童的预后。

Cochrane Database Syst Rev. 2023 Jan 23;1(1):CD013847. doi: 10.1002/14651858.CD013847.pub2.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

引用本文的文献

Work smart, not hard: analysis of delays faced by clinical trials investigating spinal fusion using Protocol AI.事半功倍：使用协议人工智能对脊柱融合临床试验所面临延误情况的分析。

Front Surg. 2025 Mar 27;12:1546367. doi: 10.3389/fsurg.2025.1546367. eCollection 2025.

本文引用的文献

Matching patients to clinical trials with large language models.利用大型语言模型为患者匹配临床试验。

Nat Commun. 2024 Nov 18;15(1):9074. doi: 10.1038/s41467-024-53081-z.

How AI is being used to accelerate clinical trials.人工智能如何被用于加速临床试验。

Nature. 2024 Mar;627(8003):S2-S5. doi: 10.1038/d41586-024-00753-x.

Machine Learning in Clinical Trials: A Primer with Applications to Neurology.临床试验中的机器学习：应用于神经病学的入门指南。

Neurotherapeutics. 2023 Jul;20(4):1066-1080. doi: 10.1007/s13311-023-01384-2. Epub 2023 May 30.

Deep learning-based risk prediction for interventional clinical trials based on protocol design: A retrospective study.基于方案设计的深度学习在介入性临床试验中的风险预测：一项回顾性研究。

Patterns (N Y). 2023 Feb 10;4(3):100689. doi: 10.1016/j.patter.2023.100689. eCollection 2023 Mar 10.

A review of research on eligibility criteria for clinical trials.临床试验入选标准研究述评。

Clin Exp Med. 2023 Oct;23(6):1867-1879. doi: 10.1007/s10238-022-00975-1. Epub 2023 Jan 5.

Machine Learning Prediction of Clinical Trial Operational Efficiency.机器学习预测临床试验运营效率。

AAPS J. 2022 Apr 21;24(3):57. doi: 10.1208/s12248-022-00703-3.

Prediction of clinical trial enrollment rates.临床试验入组率预测。

PLoS One. 2022 Feb 24;17(2):e0263193. doi: 10.1371/journal.pone.0263193. eCollection 2022.

Recruitment and retention of participants in clinical studies: Critical issues and challenges.临床研究中参与者的招募与留存：关键问题与挑战

Perspect Clin Res. 2020 Apr-Jun;11(2):51-53. doi: 10.4103/picr.PICR_6_20. Epub 2020 May 6.

An AI boost for clinical trials.人工智能助力临床试验。

Nature. 2019 Sep;573(7775):S100-S102. doi: 10.1038/d41586-019-02871-3.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于大语言模型的临床试验设计资格标准聚类分析。

Analysis of eligibility criteria clusters based on large language models for clinical trial design.

作者信息

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

材料和方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献