Pisarcik Dusan, Kissling Marc, Heimer Jakob, Farkas Monika, Leo Cornelia, Kubik-Huch Rahel A, Euler André
Department of Radiology, Kantonsspital Baden, affiliated Hospital for Research and Teaching of the Faculty of Medicine of the University of Zurich, Baden, Switzerland (D.P., M.K., J.H., M.F., R.A.K.H., A.E.).
Department of Gynecology, Interdisciplinary Breast Center, Kantonsspital Baden, affiliated Hospital for Research and Teaching of the Faculty of Medicine of the University of Zurich, Baden, Switzerland (C.L.).
Acad Radiol. 2025 Sep;32(9):4988-4996. doi: 10.1016/j.acra.2025.05.065. Epub 2025 Jun 19.
This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey.
In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences.
40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15).
Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.
本研究旨在通过一项调查评估人工智能翻译的乳房X线摄影和超声检查报告的可解释性以及患者的认知,重点关注可理解性、后续建议以及所传达的同理心。
在这项观察性研究中,创建了三份具有BI-RADS 3类、4类和5类的虚构乳房X线摄影和超声检查报告。这些报告由三种不同的大语言模型(LLM:ChatGPT-4、ChatGPT-4o、谷歌Gemini)反复翻译成通俗易懂的语言。第一步,两名乳腺影像学专家根据事实准确性、完整性和质量,为每个BI-RADS类别和大语言模型从这些反复翻译的报告中选出最佳报告。第二步,女性参与者使用李克特量表调查,比较并评价翻译后的报告在可理解性、后续建议、传达的同理心以及每份报告的附加价值方面的表现。统计分析包括累积链接混合模型和用于排序偏好的Plackett-Luce模型。
40名女性参与了调查。在所有类别中,GPT-4和GPT-4o的评分显著高于Gemini(P<0.001)。50岁以上的参与者对报告的评分显著高于18至29岁的参与者(P<0.05)。高等教育程度预示着评分较低(P=0.02)。未曾进行过乳房X线摄影检查的参与者评分较高(P=0.03),而人工智能经验没有影响(P=0.88)。排名分析显示GPT-4o最受青睐(P=0.48),其次是GPT-4(P=0.37),Gemini排名最后(P=0.15)。
人工智能翻译的放射学报告在患者偏好方面存在差异。与使用放射学语言的传统报告相比,人工智能翻译的报告为患者增加了价值,提高了可理解性和同理心,因此有可能改善乳腺影像学中的医患沟通。