The University of Texas at Austin, Dell Medical School, Department of Diagnostic Medicine, Austin, TX, USA.
The University of Texas at Austin, Austin, TX, USA.
Clin Radiol. 2024 Nov;79(11):e1366-e1371. doi: 10.1016/j.crad.2024.08.019. Epub 2024 Aug 22.
This study evaluated the readability of existing patient education materials and explored the potential of generative AI tools, such as ChatGPT-4 and Google Gemini, to simplify these materials to a sixth-grade reading level, in accordance with guidelines.
Seven patient education documents were selected from a major radiology group. ChatGPT-4 and Gemini were provided the documents and asked to reformulate to target a sixth-grade reading level. Average reading level (ARL) and proportional word count (PWC) change were calculated, and a 1-sample t-test was conducted (p=0.05). Three radiologists assessed the materials on a Likert scale for appropriateness, relevance, clarity, and information retention.
The original materials had an ARL of 11.72. ChatGPT ARL was 7.32 ± 0.76 (6/7 significant) and Gemini ARL was 6.55 ± 0.51 (7/7 significant). ChatGPT reduced word count by 15% ± 7%, with 95% retaining at least 75% of information. Gemini reduced word count by 33% ± 7%, with 68% retaining at least 75% of information. ChatGPT outputs were more appropriate (95% vs. 57%), clear (92% vs. 67%), and relevant (95% vs. 76%) than Gemini. Interrater agreement was significantly different for ChatGPT (0.91) than for Gemini (0.46).
Generative AI significantly enhances the readability of patient education materials, which did not achieve the recommended sixth-grade ARL. Radiologist evaluations confirmed the appropriateness and relevance of the AI-simplified texts. This study emphasizes the capabilities of generative AI tools and the necessity for ongoing expert review to maintain content accuracy and suitability.
本研究评估了现有的患者教育材料的可读性,并探索了生成式人工智能工具(如 ChatGPT-4 和 Google Gemini)的潜力,以根据指南将这些材料简化至六年级阅读水平。
从一家主要的放射科集团中选择了 7 份患者教育文件。为 ChatGPT-4 和 Gemini 提供了这些文件,并要求他们重新表述以针对六年级阅读水平。计算了平均阅读水平(ARL)和比例字数(PWC)的变化,并进行了 1 个样本 t 检验(p=0.05)。3 位放射科医生根据适宜性、相关性、清晰度和信息保留情况对材料进行了李克特量表评估。
原始材料的 ARL 为 11.72。ChatGPT 的 ARL 为 7.32±0.76(7/7 有统计学意义),而 Gemini 的 ARL 为 6.55±0.51(7/7 有统计学意义)。ChatGPT 减少了 15%±7%的单词数,保留了 95%的信息,至少 75%的信息得以保留。而 Gemini 减少了 33%±7%的单词数,保留了 68%的信息,至少 75%的信息得以保留。与 Gemini 相比,ChatGPT 的输出更适宜(95%对 57%),更清晰(92%对 67%),更相关(95%对 76%)。ChatGPT 的组内一致性显著高于 Gemini(0.91 对 0.46)。
生成式人工智能显著提高了患者教育材料的可读性,但仍未达到建议的六年级 ARL。放射科医生的评估证实了 AI 简化文本的适宜性和相关性。本研究强调了生成式人工智能工具的能力,以及为了保持内容的准确性和适宜性,需要进行持续的专家审查。