Suppr超能文献

将大语言模型集成到放射学工作流程中:从摘要生成个性化报告模板的影响。

Integrating Large language models into radiology workflow: Impact of generating personalized report templates from summary.

作者信息

Gupta Amit, Hussain Manzoor, Nikhileshwar Kondaveeti, Rastogi Ashish, Rangarajan Krithika

机构信息

Department of Radiology, Dr B.R.A.IRCH, All India Institute of Medical Sciences, New Delhi, India.

Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India.

出版信息

Eur J Radiol. 2025 Aug;189:112198. doi: 10.1016/j.ejrad.2025.112198. Epub 2025 May 25.

Abstract

OBJECTIVE

To evaluate feasibility of large language models (LLMs) to convert radiologist-generated report summaries into personalized report templates, and assess its impact on scan reporting time and quality.

MATERIALS AND METHODS

In this retrospective study, 100 CT scans from oncology patients were randomly divided into two equal sets. Two radiologists generated conventional reports for one set and summary reports for the other, and vice versa. Three LLMs - GPT-4, Google Gemini, and Claude Opus - generated complete reports from the summaries using institution-specific generic templates. Two expert radiologists qualitatively evaluated the radiologist summaries and LLM-generated reports using the ACR RADPEER scoring system, using conventional radiologist reports as reference. Reporting time for conventional versus summary-based reports was compared, and LLM-generated reports were analyzed for errors. Quantitative similarity and linguistic metrics were computed to assess report alignment across models with the original radiologist-generated report summaries. Statistical analyses were performed using Python 3.0 to identify significant differences in reporting times, error rates and quantitative metrics.

RESULTS

The average reporting time was significantly shorter for summary method (6.76 min) compared to conventional method (8.95 min) (p < 0.005). Among the 100 radiologist summaries, 10 received RADPEER scores worse than 1, with three deemed to have clinically significant discrepancies. Only one LLM-generated report received a worse RADPEER score than its corresponding summary. Error frequencies among LLM-generated reports showed no significant differences across models, with template-related errors being most common (χ = 1.146, p = 0.564). Quantitative analysis indicated significant differences in similarity and linguistic metrics among the three LLMs (p < 0.05), reflecting unique generation patterns.

CONCLUSION

Summary-based scan reporting along with use of LLMs to generate complete personalized report templates can shorten reporting time while maintaining the report quality. However, there remains a need for human oversight to address errors in the generated reports.

RELEVANCE STATEMENT

Summary-based reporting of radiological studies along with the use of large language models to generate tailored reports using generic templates has the potential to make the workflow more efficient by shortening the reporting time while maintaining the quality of reporting.

摘要

目的

评估大语言模型(LLMs)将放射科医生生成的报告摘要转换为个性化报告模板的可行性,并评估其对扫描报告时间和质量的影响。

材料与方法

在这项回顾性研究中,将100例肿瘤患者的CT扫描随机分为两组,每组50例。两名放射科医生为一组生成常规报告,为另一组生成总结报告,反之亦然。三个大语言模型——GPT-4、谷歌Gemini和Claude Opus——使用机构特定的通用模板从摘要生成完整报告。两名专家放射科医生使用ACR RADPEER评分系统,以常规放射科医生报告为参考,对放射科医生的总结和大语言模型生成的报告进行定性评估。比较常规报告与基于总结的报告的报告时间,并分析大语言模型生成的报告中的错误。计算定量相似度和语言指标,以评估各模型生成的报告与原始放射科医生生成的报告摘要之间的一致性。使用Python 3.0进行统计分析,以确定报告时间、错误率和定量指标的显著差异。

结果

与常规方法(8.95分钟)相比,总结方法的平均报告时间显著缩短(6.76分钟)(p < 0.005)。在100份放射科医生的总结中,10份的RADPEER评分低于1,其中三份被认为存在具有临床意义的差异。只有一份大语言模型生成的报告的RADPEER评分比其对应的总结更差。大语言模型生成的报告中的错误频率在各模型之间没有显著差异,与模板相关的错误最为常见(χ = 1.146,p = 0.564)。定量分析表明,三个大语言模型在相似度和语言指标上存在显著差异(p < 0.05),反映了独特的生成模式。

结论

基于总结的扫描报告以及使用大语言模型生成完整的个性化报告模板可以缩短报告时间,同时保持报告质量。然而,仍需要人工监督以解决生成报告中的错误。

相关性声明

放射学研究的基于总结的报告以及使用大语言模型通过通用模板生成定制报告,有可能通过缩短报告时间同时保持报告质量,使工作流程更高效。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验