Suppr超能文献

用于从电子健康记录中提取结直肠癌和发育异常组织病理学诊断的大语言模型

Large language models for extracting histopathologic diagnoses of colorectal cancer and dysplasia from electronic health records.

作者信息

Johnson Brian, Bath Tyler, Huang Xinyi, Lamm Mark, Earles Ashley, Eddington Hyrum, Dornisch Anna M, Jih Lily J, Gupta Samir, Shah Shailja C, Curtius Kit

机构信息

Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.

Veterans Medical Research Foundation, San Diego, CA, USA.

出版信息

medRxiv. 2025 Apr 22:2024.11.27.24318083. doi: 10.1101/2024.11.27.24318083.

Abstract

BACKGROUND

Accurate data resources are essential for impactful medical research, but available structured datasets are often incomplete or inaccurate. Recent advances in open-weight large language models (LLMs) enable more accurate data extraction from unstructured text in electronic health records (EHRs) but have not yet been thoroughly validated for challenging diagnoses such as inflammatory bowel disease (IBD)-related neoplasia.

OBJECTIVE

Create a validated approach using LLMs for identifying histopathologic diagnoses in pathology reports from the nationwide Veterans Health Administration database, including patients with genotype data within the Million Veteran Program (MVP) biobank.

DESIGN

Our approach utilizes simple 'yes/no' question prompts for following phenotypes of interest: any colorectal dysplasia, high-grade dysplasia and/or colorectal adenocarcinoma (HGD/CRC), and invasive CRC. We validated the method on diagnostic tasks by applying prompts to reports from patients with IBD (and validated separately in non-IBD) and calculated F-1 scores as a balanced accuracy measure.

RESULTS

In patients with IBD in MVP, we achieved F1-scores of 96.1% (95% CI 92.5-99.4%) for identifying dysplasia, 93.7% (88.2-98.4%) for identifying HGD/CRC, and 98% (96.3-99.4%) for identifying CRC. In patients without IBD in MVP, we achieved F1-scores of 99.2% (98.2-100%) for identifying any colorectal dysplasia, 96.5% (93.0-99.2%) for identifying HGD/CRC, and 95% (92.8-97.2%) for identifying CRC using LLM Gemma-2.

CONCLUSION

LLMs provided excellent accuracy in extracting the diagnoses of interest from EHRs. Our validated methods generalized to unstructured pathology notes, even withstanding challenges of resource-limited computing environments. This may therefore be a promising approach for other clinical phenotypes given the minimal human-led development required.

摘要

背景

准确的数据资源对于有影响力的医学研究至关重要,但现有的结构化数据集往往不完整或不准确。开放权重的大型语言模型(LLMs)的最新进展使得能够从电子健康记录(EHRs)中的非结构化文本中更准确地提取数据,但尚未针对诸如炎症性肠病(IBD)相关肿瘤等具有挑战性的诊断进行全面验证。

目的

使用LLMs创建一种经过验证的方法,用于在全国退伍军人健康管理局数据库的病理报告中识别组织病理学诊断,包括百万退伍军人计划(MVP)生物库中具有基因型数据的患者。

设计

我们的方法利用简单的“是/否”问题提示来关注感兴趣的以下表型:任何结直肠发育异常、高级别发育异常和/或结直肠癌(HGD/CRC)以及浸润性CRC。我们通过将提示应用于IBD患者的报告(并在非IBD患者中单独验证)来验证该方法在诊断任务上的有效性,并计算F-1分数作为平衡准确性的度量。

结果

在MVP中的IBD患者中,我们识别发育异常的F1分数为96.1%(95%CI 92.5 - 99.4%),识别HGD/CRC的F1分数为93.7%(88.2 - 98.4%),识别CRC的F1分数为98%(96.3 - 99.4%)。在MVP中的非IBD患者中,使用LLM Gemma-2识别任何结直肠发育异常的F1分数为99.2%(98.2 - 100%),识别HGD/CRC的F1分数为96.5%(93.0 - 99.2%),识别CRC的F1分数为95%(92.8 - 97.2%)。

结论

LLMs在从EHRs中提取感兴趣的诊断方面提供了出色的准确性。我们经过验证的方法适用于非结构化的病理记录,甚至能够经受住资源有限的计算环境的挑战。因此,鉴于所需的人工主导开发最少,这可能是针对其他临床表型的一种有前途的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8b2/12045448/94f4afa14e86/nihpp-2024.11.27.24318083v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验