Sunshine Alexis, Honce Grace H, Callen Andrew L, Zander David A, Tanabe Jody L, Pisani Petrucci Samantha L, Lin Chen-Tan, Honce Justin M
Department of Radiology, University of Colorado Anschutz Medical Campus, 19th Ave. Mail Stop C278, Aurora, CO, 80045, United States, 1 303-724-3796, 1 303-724-3795.
Hartway Evaluation Group, Denver, CO, United States.
JMIR Form Res. 2025 Aug 27;9:e76097. doi: 10.2196/76097.
BACKGROUND: Radiology reports convey critical medical information to health care providers and patients. Unfortunately, they are often difficult for patients to comprehend, causing confusion and anxiety, thereby limiting patient engagement in health care decision-making. Large language models (LLMs) like ChatGPT (OpenAI) can create simplified, patient-friendly report summaries to increase accessibility, albeit with errors. OBJECTIVE: We evaluated the accuracy and clarity of ChatGPT-generated summaries compared to original radiologist-assessed radiology reports, assessed patients' understanding and satisfaction with the summaries compared to the original reports, and compared the readability of the original reports and summaries using validated readability metrics. METHODS: We anonymized 30 radiology reports created by neuroradiologists at our institution (6 brain magnetic resonance imaging, 6 brain computed tomography, 6 head and neck computed tomography angiography, 6 neck computed tomography, and 6 spine computed tomography). These anonymized reports were processed by ChatGPT to produce patient-centric summaries. Four board-certified neuroradiologists evaluated the ChatGPT-generated summaries on quality and accuracy compared to the original reports, and 4 patient volunteers separately evaluated the reports and summaries on perceived understandability and satisfaction. Readability was assessed using word count and validated readability scales. RESULTS: After reading the summary, patient confidence in understanding (98%, 116/118 vs 26%, 31/118) and satisfaction regarding the level of jargon/terminology (91%, 107/118 vs 8%, 9/118) and time taken to understand the content (97%, 115/118 vs 23%, 27/118) substantially improved. Ninety-two percent (108/118) of responses indicated the summary clarified patients' questions about the report, and 98% (116/118) of responses indicated patients would use the summary if available, with 67% (79/118) of responses indicating they would want access to both the report and summary, while 26% (31/118) of responses indicated only wanting the summary. Eighty-three percent (100/120) of radiologist responses indicated the summary represented the original report "extremely well" or "very well," with only 5% (6/120) of responses indicating it did so "slightly well" or "not well at all." Five percent (6/120) of responses indicated there was missing relevant medical information in the summary, 12% (14/120) reported instances of overemphasis of nonsignificant findings, and 18% (22/120) reported instances of underemphasis of significant findings. No fabricated findings were identified. Overall, 83% (99/120) of responses indicated that the summary would definitely/probably not lead patients to incorrect conclusions about the original report, with 10% (12/120) of responses indicating the summaries may do so. CONCLUSIONS: ChatGPT-generated summaries could significantly improve perceived comprehension and satisfaction while accurately reflecting most key information from original radiology reports. Instances of minor omissions and under-/overemphasis were noted in some summaries, underscoring the need for ongoing validation and oversight. Overall, these artificial intelligence-generated, patient-centric summaries hold promise for enhancing patient-centered communication in radiology.
背景:放射学报告向医疗服务提供者和患者传达关键的医学信息。不幸的是,患者往往难以理解这些报告,从而导致困惑和焦虑,进而限制了患者参与医疗决策。像ChatGPT(OpenAI)这样的大语言模型可以创建简化的、对患者友好的报告摘要,以提高可及性,尽管存在错误。 目的:我们将ChatGPT生成的摘要与放射科医生评估的原始放射学报告进行比较,评估其准确性和清晰度;将摘要与原始报告进行比较,评估患者对摘要的理解和满意度;并使用经过验证的可读性指标比较原始报告和摘要的可读性。 方法:我们对本机构神经放射科医生创建的30份放射学报告进行匿名处理(6份脑磁共振成像、6份脑计算机断层扫描、6份头颈部计算机断层扫描血管造影、6份颈部计算机断层扫描和6份脊柱计算机断层扫描)。这些匿名报告由ChatGPT处理,以生成以患者为中心的摘要。四位获得董事会认证的神经放射科医生将ChatGPT生成的摘要与原始报告在质量和准确性方面进行评估,4名患者志愿者分别对报告和摘要在感知的可理解性和满意度方面进行评估。使用单词计数和经过验证的可读性量表评估可读性。 结果:阅读摘要后,患者在理解方面的信心(98%,116/118对26%,31/118)、对术语行话水平的满意度(91%,107/118对8%,9/118)以及理解内容所需的时间(97%,115/118对23%,27/118)都有显著提高。92%(108/118)的回复表明摘要澄清了患者对报告的疑问,98%(116/118)的回复表明如果有摘要,患者会使用,67%(79/118)的回复表明他们希望同时获取报告和摘要,而26%(31/118)的回复表明只想要摘要。83%(100/120)的放射科医生回复表明摘要“非常好”或“很好”地呈现了原始报告,只有5%(6/120)的回复表明“一般”或“完全不好”。5%(6/120)的回复表明摘要中缺少相关医学信息,12%(14/120)报告存在对无意义发现过度强调的情况,18%(22/120)报告存在对重要发现强调不足的情况。未发现虚构的发现。总体而言,83%(99/120)的回复表明摘要肯定/可能不会使患者对原始报告得出错误结论,10%(12/120)的回复表明摘要可能会导致错误结论。 结论:ChatGPT生成的摘要可以显著提高感知的理解度和满意度,同时准确反映原始放射学报告的大多数关键信息。在一些摘要中发现了轻微遗漏以及强调不足/过度强调的情况,这突出了持续验证和监督的必要性。总体而言,这些由人工智能生成的、以患者为中心的摘要有望加强放射学中以患者为中心的沟通。
Cochrane Database Syst Rev. 2022-5-20
Am J Obstet Gynecol. 2025-6-25
Radiol Med. 2024-12
JAMA. 2024-2-27
Eur Radiol. 2024-7