Shamil Eamon, Ko Tsz Ki, Fan Ka Siu, Schuster-Bruce James, Jaafar Mustafa, Khwaja Sadie, Eynon-Lewis Nicholas, D'Souza Alwyn, Andrews Peter
The Royal National ENT Hospital, University College London Hospitals NHS Foundation Trust, London, England, United Kingdom.
Royal Stoke University Hospital, United Kingdom.
Facial Plast Surg. 2024 Oct 15. doi: 10.1055/a-2413-3675.
The evolution of artificial intelligence has introduced new ways to disseminate health information, including natural language processing models like ChatGPT. However, the quality and readability of such digitally generated information remains understudied. This study is the first to compare the quality and readability of digitally generated health information against leaflets produced by professionals.
Patient information leaflets from five ENT UK leaflets and their corresponding ChatGPT responses were extracted from the Internet. Assessors with various degrees of medical knowledge evaluated the content using the Ensuring Quality Information for Patients (EQIP) tool and readability tools including the Flesch-Kincaid Grade Level (FKGL). Statistical analysis was performed to identify differences between leaflets, assessors, and sources of information.
ENT UK leaflets were of moderate quality, scoring a median EQIP of 23. Statistically significant differences in overall EQIP score were identified between ENT UK leaflets, but ChatGPT responses were of uniform quality. Nonspecialist doctors rated the highest EQIP scores, while medical students scored the lowest. The mean readability of ENT UK leaflets was higher than ChatGPT responses. The information metrics of ENT UK leaflets were moderate and varied between topics. Equivalent ChatGPT information provided comparable content quality, but with reduced readability.
ChatGPT patient information and professionally produced leaflets had comparable content, but large language model content required a higher reading age. With the increasing use of online health resources, this study highlights the need for a balanced approach that considers both the quality and readability of patient education materials.
人工智能的发展引入了传播健康信息的新方式,包括像ChatGPT这样的自然语言处理模型。然而,此类数字生成信息的质量和可读性仍未得到充分研究。本研究首次将数字生成的健康信息的质量和可读性与专业人员制作的宣传册进行比较。
从互联网上提取了英国耳鼻喉科协会(ENT UK)的五份患者信息宣传册及其相应的ChatGPT回复。具有不同医学知识水平的评估人员使用患者质量信息保障(EQIP)工具和包括弗莱什-金凯德年级水平(FKGL)在内的可读性工具对内容进行评估。进行统计分析以确定宣传册、评估人员和信息来源之间的差异。
英国耳鼻喉科协会的宣传册质量中等,EQIP中位数得分为23。在英国耳鼻喉科协会的宣传册之间,总体EQIP得分存在统计学上的显著差异,但ChatGPT的回复质量一致。非专科医生给出的EQIP得分最高,而医学生得分最低。英国耳鼻喉科协会宣传册的平均可读性高于ChatGPT的回复。英国耳鼻喉科协会宣传册的信息指标中等,且因主题而异。等效的ChatGPT信息提供了可比的内容质量,但可读性降低。
ChatGPT患者信息和专业制作的宣传册内容相当,但大语言模型生成的内容需要更高的阅读年龄。随着在线健康资源使用的增加,本研究强调需要一种平衡的方法,同时考虑患者教育材料的质量和可读性。