Ayo-Ajibola Oluwatobiloba, Davis Ryan J, Lin Matthew E, Vukkadala Neelaysh, O'Dell Karla, Swanson Mark S, Johns Michael M, Shuman Elizabeth A
Keck School of Medicine of the University of Southern California Los Angeles California USA.
Department of Head and Neck Surgery University of California Los Angeles Los Angeles California USA.
Laryngoscope Investig Otolaryngol. 2024 Jul 16;9(4):e1300. doi: 10.1002/lio2.1300. eCollection 2024 Aug.
Safe home tracheostomy care requires engagement and troubleshooting by patients, who may turn to online, AI-generated information sources. This study assessed the quality of ChatGPT responses to such queries.
In this cross-sectional study, ChatGPT was prompted with 10 hypothetical tracheostomy care questions in three domains (complication management, self-care advice, and lifestyle adjustment). Responses were graded by four otolaryngologists for appropriateness, accuracy, and overall score. The readability of responses was evaluated using the Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKRGL). Descriptive statistics and ANOVA testing were performed with statistical significance set to < .05.
On a scale of 1-5, with 5 representing the greatest appropriateness or overall score and a 4-point scale with 4 representing the highest accuracy, the responses exhibited moderately high appropriateness (mean = 4.10, SD = 0.90), high accuracy (mean = 3.55, SD = 0.50), and moderately high overall scores (mean = 4.02, SD = 0.86). Scoring between response categories (self-care recommendations, complication recommendations, lifestyle adjustments, and special device considerations) revealed no significant scoring differences. Suboptimal responses lacked nuance and contained incorrect information and recommendations. Readability indicated college and advanced levels for FRE (Mean = 39.5, SD = 7.17) and FKRGL (Mean = 13.1, SD = 1.47), higher than the sixth-grade level recommended for patient-targeted resources by the NIH.
While ChatGPT-generated tracheostomy care responses may exhibit acceptable appropriateness, incomplete or misleading information may have dire clinical consequences. Further, inappropriately high reading levels may limit patient comprehension and accessibility. At this point in its technological infancy, AI-generated information should not be solely relied upon as a direct patient care resource.
安全的家庭气管造口护理需要患者参与并解决问题,患者可能会求助于在线的、人工智能生成的信息来源。本研究评估了ChatGPT对这类问题的回答质量。
在这项横断面研究中,向ChatGPT提出了三个领域(并发症管理、自我护理建议和生活方式调整)的10个假设性气管造口护理问题。由四位耳鼻喉科医生对回答的适当性、准确性和总体评分进行分级。使用弗莱什易读性(FRE)和弗莱什-金凯德阅读年级水平(FKRGL)评估回答的可读性。进行描述性统计和方差分析测试,设定统计学显著性为<0.05。
在1至5分的量表上,5分代表最大的适当性或总体评分,在4分制量表上,4分代表最高的准确性,回答表现出中等偏高的适当性(均值=4.10,标准差=0.90)、高准确性(均值=3.55,标准差=0.50)和中等偏高的总体评分(均值=4.02,标准差=0.86)。不同回答类别(自我护理建议、并发症建议、生活方式调整和特殊设备考虑)之间的评分没有显著差异。欠佳的回答缺乏细微差别,包含错误信息和建议。可读性表明FRE(均值=39.5,标准差=7.17)和FKRGL(均值=13.1,标准差=1.47)处于大学及以上水平,高于美国国立卫生研究院推荐的针对患者的资源的六年级水平。
虽然ChatGPT生成气管造口护理回答可能表现出可接受的适当性,但不完整或误导性信息可能会产生严重的临床后果。此外,过高的阅读水平可能会限制患者的理解和获取。在其技术尚处于初期阶段,不应仅依赖人工智能生成的信息作为直接的患者护理资源。