Suppr
超能文献

评估革命性人工智能聊天机器人的口咽癌信息。

Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot.

机构信息

Keck School of Medicine of the University of Southern California, Los Angeles, California, USA.

Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA.

出版信息

Laryngoscope. 2024 May;134(5):2252-2257. doi: 10.1002/lary.31191. Epub 2023 Nov 20.

DOI:10.1002/lary.31191

PMID:37983846

Abstract

OBJECTIVE

With burgeoning popularity of artificial intelligence-based chatbots, oropharyngeal cancer patients now have access to a novel source of medical information. Because chatbot information is not reviewed by experts, we sought to evaluate an artificial intelligence-based chatbot's oropharyngeal cancer-related information for accuracy.

METHODS

Fifteen oropharyngeal cancer-related questions were developed and input into ChatGPT version 3.5. Four physician-graders independently assessed accuracy, comprehensiveness, and similarity to a physician response using 5-point Likert scales. Responses graded lower than three were then critiqued by physician-graders. Critiques were analyzed using inductive thematic analysis. Readability of responses was assessed using Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKRGL) scales.

RESULTS

Average accuracy, comprehensiveness, and similarity to a physician response scores were 3.88 (SD = 0.99), 3.80 (SD = 1.14), and 3.67 (SD = 1.08), respectively. Posttreatment-related questions were most accurate, comprehensive, and similar to a physician response, followed by treatment-related, then diagnosis-related questions. Posttreatment-related questions scored significantly higher than diagnosis-related questions in all three domains (p < 0.01). Two themes of the physician critiques were identified: suboptimal education value and potential to misinform patients. The mean FRE and FKRGL scores both indicated greater than an 11th grade readability level-higher than the 6th grade level recommended for patients.

CONCLUSION

ChatGPT responses may not educate patients to an appropriate degree, could outright misinform them, and read at a more difficult grade level than is recommended for patient material. As oropharyngeal cancer patients represent a vulnerable population facing complex, life-altering diagnoses, and treatments, they should be cautious when consuming chatbot-generated medical information.

LEVEL OF EVIDENCE

NA Laryngoscope, 134:2252-2257, 2024.

摘要

目的

随着基于人工智能的聊天机器人的日益普及，口咽癌患者现在可以获得一种新的医学信息来源。由于聊天机器人的信息未经专家审查，我们试图评估基于人工智能的聊天机器人对口咽癌相关信息的准确性。

方法

开发了 15 个口咽癌相关问题，并输入到 ChatGPT 版本 3.5 中。四位医师评分者独立使用 5 分李克特量表评估准确性、全面性和与医生回答的相似性。评分低于 3 的回答将由医师评分者进行批评。使用归纳主题分析对批评进行分析。使用弗莱什阅读容易度（FRE）和弗莱什-金凯德阅读年级水平（FKRGL）量表评估回答的可读性。

结果

平均准确性、全面性和与医生回答的相似性评分为 3.88（SD=0.99）、3.80（SD=1.14）和 3.67（SD=1.08）。治疗后相关问题最准确、全面和与医生回答相似，其次是治疗相关问题，然后是诊断相关问题。治疗后相关问题在所有三个领域的评分均显著高于诊断相关问题（p<0.01）。医师批评的两个主题是：教育价值欠佳和潜在的误导患者。平均 FRE 和 FKRGL 分数均表明阅读水平高于 11 年级-高于推荐给患者的 6 年级水平。