Suppr超能文献

用于匹配纵向放射学报告中的检查结果并跟踪间隔变化的隐私保护大语言模型。

Privacy-Preserving Large Language Model for Matching Findings and Tracking Interval Changes in Longitudinal Radiology Reports.

作者信息

Mathai Tejas Sudharshan, Kim Boah, Stroie Oana M, Summers Ronald M

机构信息

Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Building 10, Room 1C224, Bethesda, MD, 20892-1182, USA.

Department of MetaBioHealth, Sungkyunkwan University, Seoul, South Korea.

出版信息

J Imaging Inform Med. 2025 Apr 11. doi: 10.1007/s10278-025-01478-7.

Abstract

In current radiology practice, radiologists identify a finding in the current imaging exam, manually match it against the description from the prior exam report and assess interval changes. Large Language Models (LLMs) can identify report findings, but their ability to track interval changes has not been tested. The goal of this study was to determine the utility of a privacy-preserving LLM for matching findings between two reports (prior and follow-up) and tracking interval changes in size. In this retrospective study, body MRI reports from NIH (internal) were collected. A two-stage framework was employed for matching findings and tracking interval changes. In Stage 1, the LLM took a sentence from the follow-up report and discovered a matched finding in the prior report. In Stage 2, the LLM predicted the interval change status (increase, decrease, or stable) of the matched findings. Seven LLMs were locally evaluated and the best LLM was validated on an external non-contrast chest CT dataset. Agreement with the reference (radiologist) was measured using Cohen's Kappa (κ). The internal body MRI dataset had 240 studies (120 patients, mean age, 47 ± 16 years; 65 men) and the external non-contrast chest CT dataset contained 134 studies (67 patients, mean age, 58 ± 18 years; 44 men). On the internal dataset, TenyxChat-7B LLM fared the best for matching findings with an F1-score of 85.4% (95% CI: 80.8, 89.9) over the other LLMs (p < 0.05). For interval change detection, the same LLM achieved a 62.7% F1-score and showed a moderate agreement (κ = 0.46, 95% CI: 0.37, 0.55). For the external dataset, the same LLM attained F1-scores of 81.8% (95% CI: 74.4, 89.1) for matching findings and 77.4% for interval change detection respectively, with a substantial agreement (κ = 0.64, 95% CI: 0.49, 0.80). The TenyxChat-7B LLM used for matching longitudinal report findings and tracking interval changes showed moderate to substantial agreement with the reference standard. For structured reporting, the LLM can pre-fill the "Findings" section of the next follow-up exam report with a summary of longitudinal changes to important findings. It can also enhance the communication between the referring physician and radiologist.

摘要

在当前的放射学实践中,放射科医生在当前的影像检查中识别出一个发现,将其与之前检查报告中的描述进行手动匹配,并评估期间变化。大语言模型(LLMs)可以识别报告中的发现,但其追踪期间变化的能力尚未经过测试。本研究的目的是确定一种隐私保护大语言模型在匹配两份报告(之前和随访)中的发现以及追踪大小的期间变化方面的效用。在这项回顾性研究中,收集了美国国立卫生研究院(内部)的身体MRI报告。采用了一个两阶段框架来匹配发现并追踪期间变化。在第一阶段,大语言模型从随访报告中选取一句话,并在之前的报告中发现匹配的发现。在第二阶段,大语言模型预测匹配发现的期间变化状态(增加、减少或稳定)。对七个大语言模型进行了本地评估,并在一个外部非增强胸部CT数据集上对最佳的大语言模型进行了验证。使用科恩卡方(κ)来衡量与参考标准(放射科医生)的一致性。内部身体MRI数据集有240项研究(120名患者,平均年龄47±16岁;65名男性),外部非增强胸部CT数据集包含134项研究(67名患者,平均年龄58±18岁;44名男性)。在内部数据集上,TenyxChat - 7B大语言模型在匹配发现方面表现最佳,F1分数为85.4%(95%置信区间:80.8,89.9),优于其他大语言模型(p < 0.05)。对于期间变化检测,同一个大语言模型的F1分数为62.7%,并显示出中等一致性(κ = 0.46,95%置信区间:0.37,0.55)。对于外部数据集,同一个大语言模型在匹配发现方面的F1分数分别为81.8%(95%置信区间:74.4,89.1),在期间变化检测方面为77.4%,具有高度一致性(κ = 0.64,95%置信区间:0.49,0.80)。用于匹配纵向报告发现和追踪期间变化的TenyxChat - 7B大语言模型与参考标准显示出中等至高度一致性。对于结构化报告,大语言模型可以用重要发现的纵向变化总结预填充下一次随访检查报告的“发现 ”部分。它还可以加强转诊医生和放射科医生之间的沟通。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验