Suppr超能文献

从行话到清晰明了:利用人工智能大语言模型提高足踝放射学报告的可读性

From jargon to clarity: Improving the readability of foot and ankle radiology reports with an artificial intelligence large language model.

作者信息

Butler James J, Harrington Michael C, Tong Yixuan, Rosenbaum Andrew J, Samsonov Alan P, Walls Raymond J, Kennedy John G

机构信息

Foot and Ankle Division, Department of Orthopaedic Surgery, NYU Langone Health, 171 Delancey St, 2nd floor, New York City, USA.

Department of Orthopedic Surgery, Albany Medical Center, Albany, New York, USA.

出版信息

Foot Ankle Surg. 2024 Jun;30(4):331-337. doi: 10.1016/j.fas.2024.01.008. Epub 2024 Feb 5.

Abstract

BACKGROUND

The purpose of this study was to evaluate the efficacy of an Artificial Intelligence Large Language Model (AI-LLM) at improving the readability foot and ankle orthopedic radiology reports.

METHODS

The radiology reports from 100 foot or ankle X-Rays, 100 computed tomography (CT) scans and 100 magnetic resonance imaging (MRI) scans were randomly sampled from the institution's database. The following prompt command was inserted into the AI-LLM: "Explain this radiology report to a patient in layman's terms in the second person: [Report Text]". The mean report length, Flesch reading ease score (FRES) and Flesch-Kincaid reading level (FKRL) were evaluated for both the original radiology report and the AI-LLM generated report. The accuracy of the information contained within the AI-LLM report was assessed via a 5-point Likert scale. Additionally, any "hallucinations" generated by the AI-LLM report were recorded.

RESULTS

There was a statistically significant improvement in mean FRES scores in the AI-LLM generated X-Ray report (33.8 ± 6.8 to 72.7 ± 5.4), CT report (27.8 ± 4.6 to 67.5 ± 4.9) and MRI report (20.3 ± 7.2 to 66.9 ± 3.9), all p < 0.001. There was also a statistically significant improvement in mean FKRL scores in the AI-LLM generated X-Ray report (12.2 ± 1.1 to 8.5 ± 0.4), CT report (15.4 ± 2.0 to 8.4 ± 0.6) and MRI report (14.1 ± 1.6 to 8.5 ± 0.5), all p < 0.001. Superior FRES scores were observed in the AI-LLM generated X-Ray report compared to the AI-LLM generated CT report and MRI report, p < 0.001. The mean Likert score for the AI-LLM generated X-Ray report, CT report and MRI report was 4.0 ± 0.3, 3.9 ± 0.4, and 3.9 ± 0.4, respectively. The rate of hallucinations in the AI-LLM generated X-Ray report, CT report and MRI report was 4%, 7% and 6%, respectively.

CONCLUSION

AI-LLM was an efficacious tool for improving the readability of foot and ankle radiological reports across multiple imaging modalities. Superior FRES scores together with superior Likert scores were observed in the X-Ray AI-LLM reports compared to the CT and MRI AI-LLM reports. This study demonstrates the potential use of AI-LLMs as a new patient-centric approach for enhancing patient understanding of their foot and ankle radiology reports. Jel Classifications: IV.

摘要

背景

本研究的目的是评估人工智能大语言模型(AI-LLM)在提高足踝部骨科放射学报告可读性方面的效果。

方法

从该机构数据库中随机抽取100份足部或踝部X线、100份计算机断层扫描(CT)和100份磁共振成像(MRI)扫描的放射学报告。将以下提示命令插入AI-LLM:“用通俗易懂的第二人称向患者解释这份放射学报告:[报告文本]”。对原始放射学报告和AI-LLM生成的报告评估平均报告长度、弗莱什易读性分数(FRES)和弗莱什-金凯德阅读等级(FKRL)。通过5点李克特量表评估AI-LLM报告中所含信息的准确性。此外,记录AI-LLM报告产生的任何“幻觉”。

结果

AI-LLM生成的X线报告(从33.8±6.8提高到72.7±5.4)、CT报告(从27.8±4.6提高到67.5±4.9)和MRI报告(从20.3±7.2提高到66.9±3.9)的平均FRES分数有统计学显著提高,所有p<0.001。AI-LLM生成的X线报告(从12.2±1.1降低到8.5±0.4)、CT报告(从15.4±2.0降低到8.4±0.6)和MRI报告(从14.1±1.6降低到8.5±0.5)的平均FKRL分数也有统计学显著提高,所有p<0.001。与AI-LLM生成的CT报告和MRI报告相比,AI-LLM生成的X线报告观察到更高的FRES分数,p<0.001。AI-LLM生成的X线报告、CT报告和MRI报告的平均李克特分数分别为4.0±0.3、3.9±0.4和3.9±0.4。AI-LLM生成的X线报告、CT报告和MRI报告的幻觉发生率分别为4%、7%和6%。

结论

AI-LLM是一种有效的工具,可提高多种成像方式下足踝部放射学报告的可读性。与CT和MRI的AI-LLM报告相比,X线AI-LLM报告观察到更高的FRES分数和更高的李克特分数。本研究证明了AI-LLM作为一种以患者为中心的新方法在增强患者对其足踝部放射学报告理解方面的潜在用途。Jel分类:IV。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验