Division of Vascular and Interventional Radiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts.
Division of Vascular and Interventional Radiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts.
J Vasc Interv Radiol. 2023 Oct;34(10):1760-1768.e32. doi: 10.1016/j.jvir.2023.05.037. Epub 2023 Jun 16.
To assess the accuracy, completeness, and readability of patient educational material produced by a machine learning model and compare the output to that provided by a societal website.
Content from the Society of Interventional Radiology Patient Center website was retrieved, categorized, and organized into discrete questions. These questions were entered into the ChatGPT platform, and the output was analyzed for word and sentence counts, readability using multiple validated scales, factual correctness, and suitability for patient education using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) instrument.
A total of 21,154 words were analyzed, including 7,917 words from the website and 13,377 words representing the total output of the ChatGPT platform across 22 text passages. Compared to the societal website, output from the ChatGPT platform was longer and more difficult to read on 4 of 5 readability scales. The ChatGPT output was incorrect for 12 (11.5%) of 104 questions. When reviewed using the PEMAT-P tool, the ChatGPT content scored lower than the website material. Content from both the website and ChatGPT were significantly above the recommended fifth or sixth grade level for patient education, with a mean Flesch-Kincaid grade level of 11.1 (±1.3) for the website and 11.9 (±1.6) for the ChatGPT content.
The ChatGPT platform may produce incomplete or inaccurate patient educational content, and providers should be familiar with the limitations of the system in its current form. Opportunities may exist to fine-tune existing large language models, which could be optimized for the delivery of patient educational content.
评估机器学习模型生成的患者教育材料的准确性、完整性和可读性,并将其输出与社会网站提供的内容进行比较。
从介入放射学学会患者中心网站检索内容,进行分类,并组织成离散的问题。将这些问题输入 ChatGPT 平台,分析其词汇和句子数量、使用多个经过验证的量表进行的可读性、事实正确性以及使用印刷材料患者教育材料评估工具(PEMAT-P)仪器进行患者教育的适宜性。
共分析了 21154 个单词,包括网站的 7917 个单词和 ChatGPT 平台的 13377 个单词,涵盖了 22 个文本段落的总输出。与社会网站相比,ChatGPT 平台的输出在 5 个可读性量表中的 4 个上更长且更难阅读。ChatGPT 输出在 104 个问题中的 12 个(11.5%)是错误的。使用 PEMAT-P 工具进行审查时,ChatGPT 内容的评分低于网站材料。网站和 ChatGPT 的内容都明显高于推荐的五年级或六年级的患者教育水平,网站的 Flesch-Kincaid 平均等级为 11.1(±1.3),ChatGPT 内容的 Flesch-Kincaid 平均等级为 11.9(±1.6)。
ChatGPT 平台可能会生成不完整或不准确的患者教育内容,并且提供者应该熟悉其当前形式的系统限制。可能存在机会微调现有的大型语言模型,使其可以优化用于提供患者教育内容。