Beattie Jacob, Neufeld Sarah, Yang Daniel, Chukwuma Christian, Gul Ahmed, Desai Neil, Jiang Steve, Dohopolski Michael
Department of Radiation Oncology, University of Texas (UT) Southwestern Medical Center, Dallas, USA.
Cureus. 2024 May 10;16(5):e60044. doi: 10.7759/cureus.60044. eCollection 2024 May.
Background Clinical trial matching, essential for advancing medical research, involves detailed screening of potential participants to ensure alignment with specific trial requirements. Research staff face challenges due to the high volume of eligible patients and the complexity of varying eligibility criteria. The traditional manual process, both time-consuming and error-prone, often leads to missed opportunities. Recently, large language models (LLMs), specifically generative pre-trained transformers (GPTs), have become impressive and impactful tools. Utilizing such tools from artificial intelligence (AI) and natural language processing (NLP) may enhance the accuracy and efficiency of this process through automated patient screening against established criteria. Methods Utilizing data from the National NLP Clinical Challenges (n2c2) 2018 Challenge, we utilized 202 longitudinal patient records. These records were annotated by medical professionals and evaluated against 13 selection criteria encompassing various health assessments. Our approach involved embedding medical documents into a vector database to determine relevant document sections and then using an LLM (OpenAI's GPT-3.5 Turbo and GPT-4) in tandem with structured and chain-of-thought prompting techniques for systematic document assessment against the criteria. Misclassified criteria were also examined to identify classification challenges. Results This study achieved an accuracy of 0.81, sensitivity of 0.80, specificity of 0.82, and a micro F1 score of 0.79 using GPT-3.5 Turbo, and an accuracy of 0.87, sensitivity of 0.85, specificity of 0.89, and micro F1 score of 0.86 using GPT-4. Notably, some criteria in the ground truth appeared mislabeled, an issue we couldn't explore further due to insufficient label generation guidelines on the website. Conclusion Our findings underscore the potential of AI and NLP technologies, including LLMs, in the clinical trial matching process. The study demonstrated strong capabilities in identifying eligible patients and minimizing false inclusions. Such automated systems promise to alleviate the workload of research staff and improve clinical trial enrollment, thus accelerating the process and enhancing the overall feasibility of clinical research. Further work is needed to determine the potential of this approach when implemented on real clinical data.
背景 临床试验匹配对于推进医学研究至关重要,它涉及对潜在参与者进行详细筛选,以确保符合特定的试验要求。由于符合条件的患者数量众多以及资格标准各不相同,研究人员面临挑战。传统的手动流程既耗时又容易出错,常常导致错失机会。最近,大语言模型(LLMs),特别是生成式预训练变换器(GPTs),已成为令人印象深刻且具有影响力的工具。利用来自人工智能(AI)和自然语言处理(NLP)的此类工具,通过根据既定标准对患者进行自动筛选,可能会提高这一过程的准确性和效率。
方法 利用来自2018年国家NLP临床挑战(n2c2)的数据,我们使用了202份纵向患者记录。这些记录由医学专业人员进行注释,并根据涵盖各种健康评估的13项选择标准进行评估。我们的方法包括将医学文档嵌入向量数据库以确定相关文档部分,然后使用大语言模型(OpenAI的GPT - 3.5 Turbo和GPT - 4)与结构化和思维链提示技术相结合,对标准进行系统的文档评估。还对错误分类的标准进行了检查,以识别分类挑战。
结果 本研究使用GPT - 3.5 Turbo时,准确率达到0.81,灵敏度为0.80,特异性为0.82,微F1分数为0.79;使用GPT - 4时,准确率为0.87,灵敏度为0.85,特异性为0.89,微F1分数为0.86。值得注意的是,真值中的一些标准似乎标注错误,由于网站上的标签生成指南不足,我们无法进一步探讨这个问题。
结论 我们的研究结果强调了人工智能和自然语言处理技术,包括大语言模型,在临床试验匹配过程中的潜力。该研究在识别符合条件的患者和尽量减少错误纳入方面表现出强大的能力。这种自动化系统有望减轻研究人员的工作量并改善临床试验的入组情况,从而加速这一过程并提高临床研究的整体可行性。需要进一步开展工作,以确定在实际临床数据上实施这种方法的潜力。