Department of Otorhinolaryngology, Faculty of Medicine, Bezmialem Vakif University, Fatih, Istanbul, Turkey.
Department of Radiology, Faculty of Medicine, Bezmialem Vakif University, Fatih, Istanbul, Turkey.
Int J Pediatr Otorhinolaryngol. 2024 Jun;181:111998. doi: 10.1016/j.ijporl.2024.111998. Epub 2024 May 31.
This study examined the potential of ChatGPT as an accurate and readable source of information for parents seeking guidance on adenoidectomy, tonsillectomy, and ventilation tube insertion surgeries (ATVtis).
ChatGPT was tasked with identifying the top 15 most frequently asked questions by parents on internet search engines for each of the three specific surgical procedures. We removed repeated questions from the initial set of 45. Subsequently, we asked ChatGPT to generate answers to the remaining 33 questions. Seven highly experienced otolaryngologists individually assessed the accuracy of the responses using a four-level grading scale, from completely incorrect to comprehensive. The readability of responses was determined using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. The questions were categorized into four groups: Diagnosis and Preparation Process, Surgical Information, Risks and Complications, and Postoperative Process. Responses were then compared based on accuracy grade, FRE, and FKGL scores.
Seven evaluators each assessed 33 AI-generated responses, providing a total of 231 evaluations. Among the evaluated responses, 167 (72.3 %) were classified as 'comprehensive.' Sixty-two responses (26.8 %) were categorized as 'correct but inadequate,' and two responses (0.9 %) were assessed as 'some correct, some incorrect.' None of the responses were adjudged 'completely incorrect' by any assessors. The average FRE and FGKL scores were 57.15(±10.73) and 9.95(±1.91), respectively. Upon analyzing the responses from ChatGPT, 3 (9.1 %) were at or below the sixth-grade reading level recommended by the American Medical Association (AMA). No significant differences were found between the groups regarding readability and accuracy scores (p > 0.05).
ChatGPT can provide accurate answers to questions on various topics related to ATVtis. However, ChatGPT's answers may be too complex for some readers, as they are generally written at a high school level. This is above the sixth-grade reading level recommended for patient information by the AMA. According to our study, more than three-quarters of the AI-generated responses were at or above the 10th-grade reading level, raising concerns about the ChatGPT text's readability.
本研究旨在探讨 ChatGPT 作为一种准确且易于理解的信息来源,为寻求腺样体切除术、扁桃体切除术和通气管插入术(ATVtis)相关指导的家长提供帮助。
我们要求 ChatGPT 识别互联网搜索引擎上针对这三种特定手术的每个手术程序的前 15 个最常见的家长问题。我们从最初的 45 个问题中删除了重复的问题。随后,我们要求 ChatGPT 针对剩余的 33 个问题生成答案。七位经验丰富的耳鼻喉科医生分别使用四级评分量表(从完全不正确到全面)评估回复的准确性。使用弗莱什阅读易读性得分(FRE)和弗莱什-金凯德年级水平(FKGL)评分来确定回复的可读性。问题被分为四个组:诊断和准备过程、手术信息、风险和并发症以及术后过程。然后根据准确性等级、FRE 和 FKGL 评分对回复进行比较。
七位评估者分别评估了 33 个 AI 生成的回复,共提供了 231 次评估。在所评估的回复中,167 个(72.3%)被归类为“全面”。62 个回复(26.8%)被归类为“正确但不充分”,2 个回复(0.9%)被评估为“部分正确,部分错误”。没有任何评估者认为任何回复是“完全不正确”的。平均 FRE 和 FGKL 得分为 57.15(±10.73)和 9.95(±1.91)。通过分析 ChatGPT 的回复,有 3 个(9.1%)回复处于或低于美国医学协会(AMA)推荐的六年级阅读水平。在可读性和准确性评分方面,各组之间没有显著差异(p>0.05)。
ChatGPT 可以为与 ATVtis 相关的各种主题提供准确的答案。然而,ChatGPT 的答案可能对一些读者来说过于复杂,因为它们通常是在高中水平编写的。这高于 AMA 为患者信息推荐的六年级阅读水平。根据我们的研究,超过四分之三的 AI 生成的回复达到或高于 10 年级的阅读水平,这引起了人们对 ChatGPT 文本可读性的担忧。