Suppr超能文献

大语言模型在简化放射科报告印象方面的定量评估:一项多模态回顾性分析。

Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis.

机构信息

From the Yale School of Medicine (R.D., P.K.) and Department of Radiology and Biomedical Imaging (K.S.A., S.S.B., S.C., H.P.F.), Yale School of Medicine, 333 Cedar St, New Haven, CT 06510; Yale School of Management, New Haven, Conn (H.P.F.); and Department of Health Policy and Management, Yale School of Public Health, New Haven, Conn (H.P.F.).

出版信息

Radiology. 2024 Mar;310(3):e231593. doi: 10.1148/radiol.231593.

Abstract

Background The complex medical terminology of radiology reports may cause confusion or anxiety for patients, especially given increased access to electronic health records. Large language models (LLMs) can potentially simplify radiology report readability. Purpose To compare the performance of four publicly available LLMs (ChatGPT-3.5 and ChatGPT-4, Bard [now known as Gemini], and Bing) in producing simplified radiology report impressions. Materials and Methods In this retrospective comparative analysis of the four LLMs (accessed July 23 to July 26, 2023), the Medical Information Mart for Intensive Care (MIMIC)-IV database was used to gather 750 anonymized radiology report impressions covering a range of imaging modalities (MRI, CT, US, radiography, mammography) and anatomic regions. Three distinct prompts were employed to assess the LLMs' ability to simplify report impressions. The first prompt (prompt 1) was "Simplify this radiology report." The second prompt (prompt 2) was "I am a patient. Simplify this radiology report." The last prompt (prompt 3) was "Simplify this radiology report at the 7th grade level." Each prompt was followed by the radiology report impression and was queried once. The primary outcome was simplification as assessed by readability score. Readability was assessed using the average of four established readability indexes. The nonparametric Wilcoxon signed-rank test was applied to compare reading grade levels across LLM output. Results All four LLMs simplified radiology report impressions across all prompts tested ( < .001). Within prompts, differences were found between LLMs. Providing the context of being a patient or requesting simplification at the seventh-grade level reduced the reading grade level of output for all models and prompts (except prompt 1 to prompt 2 for ChatGPT-4) ( < .001). Conclusion Although the success of each LLM varied depending on the specific prompt wording, all four models simplified radiology report impressions across all modalities and prompts tested. © RSNA, 2024 See also the editorial by Rahsepar in this issue.

摘要

背景 放射学报告中复杂的医学术语可能会使患者感到困惑或焦虑,尤其是考虑到电子健康记录的使用增加。大型语言模型(LLM)有可能简化放射学报告的可读性。目的 比较四种市售 LLM(ChatGPT-3.5 和 ChatGPT-4、Bard[现称为 Gemini]和 Bing)在生成简化放射学报告印象方面的性能。材料与方法 在这项对四种 LLM(于 2023 年 7 月 23 日至 7 月 26 日检索)的回顾性比较分析中,使用 Medical Information Mart for Intensive Care(MIMIC)-IV 数据库收集了涵盖多种成像方式(MRI、CT、US、放射线照相术、乳房 X 线摄影术)和解剖区域的 750 份匿名放射学报告印象。使用三个不同的提示来评估 LLM 简化报告印象的能力。第一个提示(提示 1)是“简化这份放射学报告。”第二个提示(提示 2)是“我是一名患者。简化这份放射学报告。”最后一个提示(提示 3)是“将这份放射学报告简化到 7 年级水平。”每个提示后都跟有放射学报告印象,并查询一次。主要结局是简化程度,通过可读性评分来评估。使用四个既定可读性指标的平均值来评估可读性。应用非参数 Wilcoxon 符号秩检验比较 LLM 输出的阅读年级水平。结果 所有四种 LLM 在所有测试的提示中都简化了放射学报告印象(<.001)。在提示内,发现 LLM 之间存在差异。提供患者背景或要求在 7 年级水平简化,减少了所有模型和提示的输出阅读年级水平(除了 ChatGPT-4 的提示 1 到提示 2)(<.001)。结论 尽管每个 LLM 的成功程度取决于特定的提示措辞,但所有四种模型在所有测试的模式和提示中都简化了放射学报告印象。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验