用于匹配纵向放射学报告中的检查结果并跟踪间隔变化的隐私保护大语言模型。

Privacy-Preserving Large Language Model for Matching Findings and Tracking Interval Changes in Longitudinal Radiology Reports.

作者信息

Mathai Tejas Sudharshan, Kim Boah, Stroie Oana M, Summers Ronald M

机构信息

Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Building 10, Room 1C224, Bethesda, MD, 20892-1182, USA.

Department of MetaBioHealth, Sungkyunkwan University, Seoul, South Korea.

出版信息

J Imaging Inform Med. 2025 Apr 11. doi: 10.1007/s10278-025-01478-7.

DOI:10.1007/s10278-025-01478-7

PMID:40216673

Abstract

In current radiology practice, radiologists identify a finding in the current imaging exam, manually match it against the description from the prior exam report and assess interval changes. Large Language Models (LLMs) can identify report findings, but their ability to track interval changes has not been tested. The goal of this study was to determine the utility of a privacy-preserving LLM for matching findings between two reports (prior and follow-up) and tracking interval changes in size. In this retrospective study, body MRI reports from NIH (internal) were collected. A two-stage framework was employed for matching findings and tracking interval changes. In Stage 1, the LLM took a sentence from the follow-up report and discovered a matched finding in the prior report. In Stage 2, the LLM predicted the interval change status (increase, decrease, or stable) of the matched findings. Seven LLMs were locally evaluated and the best LLM was validated on an external non-contrast chest CT dataset. Agreement with the reference (radiologist) was measured using Cohen's Kappa (κ). The internal body MRI dataset had 240 studies (120 patients, mean age, 47 ± 16 years; 65 men) and the external non-contrast chest CT dataset contained 134 studies (67 patients, mean age, 58 ± 18 years; 44 men). On the internal dataset, TenyxChat-7B LLM fared the best for matching findings with an F1-score of 85.4% (95% CI: 80.8, 89.9) over the other LLMs (p < 0.05). For interval change detection, the same LLM achieved a 62.7% F1-score and showed a moderate agreement (κ = 0.46, 95% CI: 0.37, 0.55). For the external dataset, the same LLM attained F1-scores of 81.8% (95% CI: 74.4, 89.1) for matching findings and 77.4% for interval change detection respectively, with a substantial agreement (κ = 0.64, 95% CI: 0.49, 0.80). The TenyxChat-7B LLM used for matching longitudinal report findings and tracking interval changes showed moderate to substantial agreement with the reference standard. For structured reporting, the LLM can pre-fill the "Findings" section of the next follow-up exam report with a summary of longitudinal changes to important findings. It can also enhance the communication between the referring physician and radiologist.

摘要

在当前的放射学实践中，放射科医生在当前的影像检查中识别出一个发现，将其与之前检查报告中的描述进行手动匹配，并评估期间变化。大语言模型（LLMs）可以识别报告中的发现，但其追踪期间变化的能力尚未经过测试。本研究的目的是确定一种隐私保护大语言模型在匹配两份报告（之前和随访）中的发现以及追踪大小的期间变化方面的效用。在这项回顾性研究中，收集了美国国立卫生研究院（内部）的身体MRI报告。采用了一个两阶段框架来匹配发现并追踪期间变化。在第一阶段，大语言模型从随访报告中选取一句话，并在之前的报告中发现匹配的发现。在第二阶段，大语言模型预测匹配发现的期间变化状态（增加、减少或稳定）。对七个大语言模型进行了本地评估，并在一个外部非增强胸部CT数据集上对最佳的大语言模型进行了验证。使用科恩卡方（κ）来衡量与参考标准（放射科医生）的一致性。内部身体MRI数据集有240项研究（120名患者，平均年龄47±16岁；65名男性），外部非增强胸部CT数据集包含134项研究（67名患者，平均年龄58±18岁；44名男性）。在内部数据集上，TenyxChat - 7B大语言模型在匹配发现方面表现最佳，F1分数为85.4%（95%置信区间：80.8，89.9），优于其他大语言模型（p < 0.05）。对于期间变化检测，同一个大语言模型的F1分数为62.7%，并显示出中等一致性（κ = 0.46，95%置信区间：0.37，0.55）。对于外部数据集，同一个大语言模型在匹配发现方面的F1分数分别为81.8%（95%置信区间：74.4，89.1），在期间变化检测方面为77.4%，具有高度一致性（κ = 0.64，95%置信区间：0.49，0.80）。用于匹配纵向报告发现和追踪期间变化的TenyxChat - 7B大语言模型与参考标准显示出中等至高度一致性。对于结构化报告，大语言模型可以用重要发现的纵向变化总结预填充下一次随访检查报告的“发现 ”部分。它还可以加强转诊医生和放射科医生之间的沟通。

相似文献

Privacy-Preserving Large Language Model for Matching Findings and Tracking Interval Changes in Longitudinal Radiology Reports.用于匹配纵向放射学报告中的检查结果并跟踪间隔变化的隐私保护大语言模型。

J Imaging Inform Med. 2025 Apr 11. doi: 10.1007/s10278-025-01478-7.

Automatic structuring of radiology reports with on-premise open-source large language models.使用本地开源大语言模型对放射学报告进行自动结构化处理。

Eur Radiol. 2025 Apr;35(4):2018-2029. doi: 10.1007/s00330-024-11074-y. Epub 2024 Oct 10.

Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework.胸部X光病理学中的自动放射学报告标注：大语言模型框架的开发与评估

JMIR Med Inform. 2025 Mar 28;13:e68618. doi: 10.2196/68618.

Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面，确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。

Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.

Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports.使用隐私保护的大型语言模型 Vicuna 对放射科报告进行标注的可行性研究。

Radiology. 2023 Oct;309(1):e231147. doi: 10.1148/radiol.231147.

MRI spine request form enhancement and auto protocoling using a secure institutional large language model.使用安全的机构大语言模型增强和自动生成协议的MRI脊柱申请表

Spine J. 2025 Mar;25(3):505-514. doi: 10.1016/j.spinee.2024.10.021. Epub 2024 Nov 12.

From jargon to clarity: Improving the readability of foot and ankle radiology reports with an artificial intelligence large language model.从行话到清晰明了：利用人工智能大语言模型提高足踝放射学报告的可读性

Foot Ankle Surg. 2024 Jun;30(4):331-337. doi: 10.1016/j.fas.2024.01.008. Epub 2024 Feb 5.

Aligning large language models with radiologists by reinforcement learning from AI feedback for chest CT reports.通过基于人工智能反馈的强化学习使大型语言模型与放射科医生在胸部CT报告方面保持一致。

Eur J Radiol. 2025 Mar;184:111984. doi: 10.1016/j.ejrad.2025.111984. Epub 2025 Feb 6.

Accuracy of Large Language Model-based Automatic Calculation of Ovarian-Adnexal Reporting and Data System MRI Scores from Pelvic MRI Reports.基于大语言模型从盆腔MRI报告自动计算卵巢附件报告和数据系统MRI评分的准确性

Radiology. 2025 Apr;315(1):e241554. doi: 10.1148/radiol.241554.

From technical to understandable: Artificial Intelligence Large Language Models improve the readability of knee radiology reports.从技术到易懂：人工智能大语言模型提高了膝关节放射学报告的可读性。

Knee Surg Sports Traumatol Arthrosc. 2024 May;32(5):1077-1086. doi: 10.1002/ksa.12133. Epub 2024 Mar 15.

本文引用的文献

Best Practices: Burnout Is More Than Binary.最佳实践：倦怠并非非此即彼。

AJR Am J Roentgenol. 2024 Oct;223(4):e2431111. doi: 10.2214/AJR.24.31111. Epub 2024 Jul 17.

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods.使用自然语言处理对放射学报告进行自动标注：传统方法与新方法的比较。

Health Care Sci. 2023 Apr 24;2(2):120-128. doi: 10.1002/hcs2.40. eCollection 2023 Apr.

Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis.利用 ChatGPT 从急性缺血性脑卒中机械取栓的自由文本报告中提取数据：一项回顾性分析。

Radiology. 2024 Apr;311(1):e232741. doi: 10.1148/radiol.232741.

Radiology. 2023 Oct;309(1):e231147. doi: 10.1148/radiol.231147.

Graph-based automatic detection and classification of lesion changes in pairs of CT studies for oncology follow-up.基于图的肿瘤随访中 CT 研究对病变变化的自动检测和分类。

Int J Comput Assist Radiol Surg. 2024 Feb;19(2):241-251. doi: 10.1007/s11548-023-03000-2. Epub 2023 Aug 4.

Exploring the Clinical Translation of Generative Models Like ChatGPT: Promise and Pitfalls in Radiology, From Patients to Population Health.探索像ChatGPT这样的生成模型在临床中的应用：从患者到群体健康，放射学领域的前景与挑战

J Am Coll Radiol. 2023 Sep;20(9):877-885. doi: 10.1016/j.jacr.2023.07.007. Epub 2023 Jul 17.

Potential Use Cases for ChatGPT in Radiology Reporting.ChatGPT 在放射科报告中的潜在应用案例。

AJR Am J Roentgenol. 2023 Sep;221(3):373-376. doi: 10.2214/AJR.23.29198. Epub 2023 Apr 19.

Development and External Validation of an Artificial Intelligence Model for Identifying Radiology Reports Containing Recommendations for Additional Imaging.开发和外部验证用于识别包含额外成像建议的放射学报告的人工智能模型。

AJR Am J Roentgenol. 2023 Sep;221(3):377-385. doi: 10.2214/AJR.23.29120. Epub 2023 Apr 19.

Implementation of structured reporting in clinical routine: a review of 7 years of institutional experience.临床常规中结构化报告的实施：七年机构经验回顾

Insights Imaging. 2023 Apr 11;14(1):61. doi: 10.1186/s13244-023-01408-7.

Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study.利用GPT-4将自由文本放射学报告进行事后转换为结构化报告：一项多语言可行性研究。

Radiology. 2023 May;307(4):e230725. doi: 10.1148/radiol.230725. Epub 2023 Apr 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于匹配纵向放射学报告中的检查结果并跟踪间隔变化的隐私保护大语言模型。

Privacy-Preserving Large Language Model for Matching Findings and Tracking Interval Changes in Longitudinal Radiology Reports.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献