The Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, Canada.
Provincial Research Data Services, Alberta Health Services, Calgary, Canada.
BMC Med Inform Decis Mak. 2024 Oct 3;24(1):283. doi: 10.1186/s12911-024-02677-y.
The primary goal of this study is to evaluate the capabilities of Large Language Models (LLMs) in understanding and processing complex medical documentation. We chose to focus on the identification of pathologic complete response (pCR) in narrative pathology reports. This approach aims to contribute to the advancement of comprehensive reporting, health research, and public health surveillance, thereby enhancing patient care and breast cancer management strategies.
The study utilized two analytical pipelines, developed with open-source LLMs within the healthcare system's computing environment. First, we extracted embeddings from pathology reports using 15 different transformer-based models and then employed logistic regression on these embeddings to classify the presence or absence of pCR. Secondly, we fine-tuned the Generative Pre-trained Transformer-2 (GPT-2) model by attaching a simple feed-forward neural network (FFNN) layer to improve the detection performance of pCR from pathology reports.
In a cohort of 351 female breast cancer patients who underwent neoadjuvant chemotherapy (NAC) and subsequent surgery between 2010 and 2017 in Calgary, the optimized method displayed a sensitivity of 95.3% (95%CI: 84.0-100.0%), a positive predictive value of 90.9% (95%CI: 76.5-100.0%), and an F1 score of 93.0% (95%CI: 83.7-100.0%). The results, achieved through diverse LLM integration, surpassed traditional machine learning models, underscoring the potential of LLMs in clinical pathology information extraction.
The study successfully demonstrates the efficacy of LLMs in interpreting and processing digital pathology data, particularly for determining pCR in breast cancer patients post-NAC. The superior performance of LLM-based pipelines over traditional models highlights their significant potential in extracting and analyzing key clinical data from narrative reports. While promising, these findings highlight the need for future external validation to confirm the reliability and broader applicability of these methods.
本研究的主要目的是评估大型语言模型(LLM)在理解和处理复杂医学文档方面的能力。我们选择专注于识别叙事病理报告中的病理性完全缓解(pCR)。这种方法旨在为全面报告、健康研究和公共卫生监测做出贡献,从而改善患者护理和乳腺癌管理策略。
该研究利用了两个分析管道,这些管道是在医疗保健系统的计算环境中使用开源 LLM 开发的。首先,我们使用 15 种基于转换器的不同模型从病理报告中提取嵌入,然后在这些嵌入上使用逻辑回归来分类是否存在 pCR。其次,我们通过附加一个简单的前馈神经网络(FFNN)层来微调生成式预训练转换器-2(GPT-2)模型,以提高从病理报告中检测 pCR 的性能。
在 2010 年至 2017 年间,卡尔加里的 351 名接受新辅助化疗(NAC)和随后手术的女性乳腺癌患者队列中,优化后的方法显示出 95.3%(95%CI:84.0-100.0%)的敏感性、90.9%(95%CI:76.5-100.0%)的阳性预测值和 93.0%(95%CI:83.7-100.0%)的 F1 分数。通过多种 LLM 集成实现的结果超过了传统的机器学习模型,突显了 LLM 在临床病理学信息提取中的潜力。
该研究成功地证明了 LLM 解释和处理数字病理学数据的功效,特别是在确定 NAC 后乳腺癌患者的 pCR 方面。基于 LLM 的管道的性能优于传统模型,突出了它们从叙事报告中提取和分析关键临床数据的重要潜力。虽然有希望,但这些发现突出了需要进行未来的外部验证,以确认这些方法的可靠性和更广泛的适用性。