Keck School of Medicine of the University of Southern California, Los Angeles, California, USA.
Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA.
Laryngoscope. 2024 May;134(5):2252-2257. doi: 10.1002/lary.31191. Epub 2023 Nov 20.
With burgeoning popularity of artificial intelligence-based chatbots, oropharyngeal cancer patients now have access to a novel source of medical information. Because chatbot information is not reviewed by experts, we sought to evaluate an artificial intelligence-based chatbot's oropharyngeal cancer-related information for accuracy.
Fifteen oropharyngeal cancer-related questions were developed and input into ChatGPT version 3.5. Four physician-graders independently assessed accuracy, comprehensiveness, and similarity to a physician response using 5-point Likert scales. Responses graded lower than three were then critiqued by physician-graders. Critiques were analyzed using inductive thematic analysis. Readability of responses was assessed using Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKRGL) scales.
Average accuracy, comprehensiveness, and similarity to a physician response scores were 3.88 (SD = 0.99), 3.80 (SD = 1.14), and 3.67 (SD = 1.08), respectively. Posttreatment-related questions were most accurate, comprehensive, and similar to a physician response, followed by treatment-related, then diagnosis-related questions. Posttreatment-related questions scored significantly higher than diagnosis-related questions in all three domains (p < 0.01). Two themes of the physician critiques were identified: suboptimal education value and potential to misinform patients. The mean FRE and FKRGL scores both indicated greater than an 11th grade readability level-higher than the 6th grade level recommended for patients.
ChatGPT responses may not educate patients to an appropriate degree, could outright misinform them, and read at a more difficult grade level than is recommended for patient material. As oropharyngeal cancer patients represent a vulnerable population facing complex, life-altering diagnoses, and treatments, they should be cautious when consuming chatbot-generated medical information.
NA Laryngoscope, 134:2252-2257, 2024.
随着基于人工智能的聊天机器人的日益普及,口咽癌患者现在可以获得一种新的医学信息来源。由于聊天机器人的信息未经专家审查,我们试图评估基于人工智能的聊天机器人对口咽癌相关信息的准确性。
开发了 15 个口咽癌相关问题,并输入到 ChatGPT 版本 3.5 中。四位医师评分者独立使用 5 分李克特量表评估准确性、全面性和与医生回答的相似性。评分低于 3 的回答将由医师评分者进行批评。使用归纳主题分析对批评进行分析。使用弗莱什阅读容易度(FRE)和弗莱什-金凯德阅读年级水平(FKRGL)量表评估回答的可读性。
平均准确性、全面性和与医生回答的相似性评分为 3.88(SD=0.99)、3.80(SD=1.14)和 3.67(SD=1.08)。治疗后相关问题最准确、全面和与医生回答相似,其次是治疗相关问题,然后是诊断相关问题。治疗后相关问题在所有三个领域的评分均显著高于诊断相关问题(p<0.01)。医师批评的两个主题是:教育价值欠佳和潜在的误导患者。平均 FRE 和 FKRGL 分数均表明阅读水平高于 11 年级-高于推荐给患者的 6 年级水平。
ChatGPT 的回答可能无法让患者受到适当的教育,甚至可能直接误导他们,而且其阅读水平高于推荐给患者的材料的 6 年级水平。由于口咽癌患者代表了一个面临复杂、改变生活的诊断和治疗的脆弱群体,他们在使用聊天机器人生成的医疗信息时应保持谨慎。
无。喉镜,134:2252-2257,2024。