利用大语言模型加强临床试验匹配：患者筛选自动化研究

Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening.

作者信息

Beattie Jacob, Neufeld Sarah, Yang Daniel, Chukwuma Christian, Gul Ahmed, Desai Neil, Jiang Steve, Dohopolski Michael

机构信息

Department of Radiation Oncology, University of Texas (UT) Southwestern Medical Center, Dallas, USA.

出版信息

Cureus. 2024 May 10;16(5):e60044. doi: 10.7759/cureus.60044. eCollection 2024 May.

DOI:10.7759/cureus.60044

PMID:38854210

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11162699/

Abstract

Background Clinical trial matching, essential for advancing medical research, involves detailed screening of potential participants to ensure alignment with specific trial requirements. Research staff face challenges due to the high volume of eligible patients and the complexity of varying eligibility criteria. The traditional manual process, both time-consuming and error-prone, often leads to missed opportunities. Recently, large language models (LLMs), specifically generative pre-trained transformers (GPTs), have become impressive and impactful tools. Utilizing such tools from artificial intelligence (AI) and natural language processing (NLP) may enhance the accuracy and efficiency of this process through automated patient screening against established criteria. Methods Utilizing data from the National NLP Clinical Challenges (n2c2) 2018 Challenge, we utilized 202 longitudinal patient records. These records were annotated by medical professionals and evaluated against 13 selection criteria encompassing various health assessments. Our approach involved embedding medical documents into a vector database to determine relevant document sections and then using an LLM (OpenAI's GPT-3.5 Turbo and GPT-4) in tandem with structured and chain-of-thought prompting techniques for systematic document assessment against the criteria. Misclassified criteria were also examined to identify classification challenges. Results This study achieved an accuracy of 0.81, sensitivity of 0.80, specificity of 0.82, and a micro F1 score of 0.79 using GPT-3.5 Turbo, and an accuracy of 0.87, sensitivity of 0.85, specificity of 0.89, and micro F1 score of 0.86 using GPT-4. Notably, some criteria in the ground truth appeared mislabeled, an issue we couldn't explore further due to insufficient label generation guidelines on the website. Conclusion Our findings underscore the potential of AI and NLP technologies, including LLMs, in the clinical trial matching process. The study demonstrated strong capabilities in identifying eligible patients and minimizing false inclusions. Such automated systems promise to alleviate the workload of research staff and improve clinical trial enrollment, thus accelerating the process and enhancing the overall feasibility of clinical research. Further work is needed to determine the potential of this approach when implemented on real clinical data.

摘要

背景临床试验匹配对于推进医学研究至关重要，它涉及对潜在参与者进行详细筛选，以确保符合特定的试验要求。由于符合条件的患者数量众多以及资格标准各不相同，研究人员面临挑战。传统的手动流程既耗时又容易出错，常常导致错失机会。最近，大语言模型（LLMs），特别是生成式预训练变换器（GPTs），已成为令人印象深刻且具有影响力的工具。利用来自人工智能（AI）和自然语言处理（NLP）的此类工具，通过根据既定标准对患者进行自动筛选，可能会提高这一过程的准确性和效率。

方法利用来自2018年国家NLP临床挑战（n2c2）的数据，我们使用了202份纵向患者记录。这些记录由医学专业人员进行注释，并根据涵盖各种健康评估的13项选择标准进行评估。我们的方法包括将医学文档嵌入向量数据库以确定相关文档部分，然后使用大语言模型（OpenAI的GPT - 3.5 Turbo和GPT - 4）与结构化和思维链提示技术相结合，对标准进行系统的文档评估。还对错误分类的标准进行了检查，以识别分类挑战。

结果本研究使用GPT - 3.5 Turbo时，准确率达到0.81，灵敏度为0.80，特异性为0.82，微F1分数为0.79；使用GPT - 4时，准确率为0.87，灵敏度为0.85，特异性为0.89，微F1分数为0.86。值得注意的是，真值中的一些标准似乎标注错误，由于网站上的标签生成指南不足，我们无法进一步探讨这个问题。

结论我们的研究结果强调了人工智能和自然语言处理技术，包括大语言模型，在临床试验匹配过程中的潜力。该研究在识别符合条件的患者和尽量减少错误纳入方面表现出强大的能力。这种自动化系统有望减轻研究人员的工作量并改善临床试验的入组情况，从而加速这一过程并提高临床研究的整体可行性。需要进一步开展工作，以确定在实际临床数据上实施这种方法的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d9a/11162699/5f90fadc4b22/cureus-0016-00000060044-i01.jpg

相似文献

Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening.

Cureus. 2024 May 10;16(5):e60044. doi: 10.7759/cureus.60044. eCollection 2024 May.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening.

medRxiv. 2024 Feb 8:2024.02.08.24302376. doi: 10.1101/2024.02.08.24302376.

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.

JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.

Diagnostic accuracy of large language models in psychiatry.

Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

Accuracy of Large Language Models to Identify Stroke Subtypes Within Unstructured Electronic Health Record Data.

Stroke. 2025 Jul 25. doi: 10.1161/STROKEAHA.125.051993.

Enhancing Patient-Trial Matching With Large Language Models: A Scoping Review of Emerging Applications and Approaches.

JCO Clin Cancer Inform. 2025 Jun;9:e2500071. doi: 10.1200/CCI-25-00071. Epub 2025 Jun 9.

Perspectives and Experiences With Large Language Models in Health Care: Survey Study.

J Med Internet Res. 2025 May 1;27:e67383. doi: 10.2196/67383.

Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study.

JMIR Aging. 2025 Apr 11;8:e69504. doi: 10.2196/69504.

Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.

JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.

AI meets informed consent: a new era for clinical trial communication.

JNCI Cancer Spectr. 2025 Mar 3;9(2). doi: 10.1093/jncics/pkaf028.

Exploration of Using an Open-Source Large Language Model for Analyzing Trial Information: A Case Study of Clinical Trials With Decentralized Elements.

Clin Transl Sci. 2025 Mar;18(3):e70183. doi: 10.1111/cts.70183.

Natural Language Processing in medicine and ophthalmology: A review for the 21st-century clinician.

Asia Pac J Ophthalmol (Phila). 2024 Jul-Aug;13(4):100084. doi: 10.1016/j.apjo.2024.100084. Epub 2024 Jul 25.

Large Language Models Diagnose Facial Deformity.

medRxiv. 2024 Jul 11:2024.07.11.24310274. doi: 10.1101/2024.07.11.24310274.

本文引用的文献

Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching.

AMIA Annu Symp Proc. 2024 Jan 11;2023:1324-1333. eCollection 2023.

Provider motivations and barriers to cancer clinical trial screening, referral, and operations: Findings from a survey.

Cancer. 2024 Jan 1;130(1):68-76. doi: 10.1002/cncr.35044. Epub 2023 Oct 18.

Barriers to Clinical Trial Accrual: Perspectives of Community-Based Providers.

Clin Breast Cancer. 2020 Oct;20(5):395-401.e3. doi: 10.1016/j.clbc.2020.05.001. Epub 2020 May 7.

Increasing Clinical Trial Accrual via Automated Matching of Biomarker Criteria.

Pac Symp Biocomput. 2020;25:31-42.

Cohort selection for clinical trials: n2c2 2018 shared task track 1.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1163-1171. doi: 10.1093/jamia/ocz163.

Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1247-1254. doi: 10.1093/jamia/ocz149.

Hybrid bag of approaches to characterize selection criteria for cohort identification.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1172-1180. doi: 10.1093/jamia/ocz079.

Global Public Attitudes About Clinical Research and Patient Experiences With Clinical Trials.

JAMA Netw Open. 2018 Oct 5;1(6):e182969. doi: 10.1001/jamanetworkopen.2018.2969.

Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients.

BMC Med Inform Decis Mak. 2015 Apr 14;15:28. doi: 10.1186/s12911-015-0149-3.

Adult cancer clinical trials that fail to complete: an epidemic?

J Natl Cancer Inst. 2014 Sep 4;106(9). doi: 10.1093/jnci/dju229. Print 2014 Sep.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用大语言模型加强临床试验匹配：患者筛选自动化研究

Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献