Suppr超能文献

评估 ChatGPT 对甲状腺结节患者教育的回复。

Evaluating ChatGPT Responses on Thyroid Nodules for Patient Education.

机构信息

Department of Otolaryngology-Head and Neck Surgery, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, USA.

出版信息

Thyroid. 2024 Mar;34(3):371-377. doi: 10.1089/thy.2023.0491. Epub 2023 Dec 26.

Abstract

ChatGPT, an artificial intelligence (AI) chatbot, is the fastest growing consumer application in history. Given recent trends identifying increasing patient use of Internet sources for self-education, we seek to evaluate the quality of ChatGPT-generated responses for patient education on thyroid nodules. ChatGPT was queried 4 times with 30 identical questions. Queries differed by initial chatbot prompting: no prompting, patient-friendly prompting, 8th-grade level prompting, and prompting for references. Answers were scored on a hierarchical score: incorrect, partially correct, correct, or correct with references. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. Flesch-Kincaid grade level was calculated for each answer. The relationship between prompt type and grade level was assessed using analysis of variance. References provided within ChatGPT answers were totaled and analyzed for veracity. Across all prompts ( = 120 questions), 83 answers (69.2%) were at least correct. Proportions of responses that were at least partially correct ( = 0.795) and correct ( = 0.402) did not differ by prompt; responses that were correct with references did ( < 0.0001). Responses from 8th-grade level prompting were the lowest mean grade level (13.43 ± 2.86) and were significantly lower than no prompting (14.97 ± 2.01,  = 0.01) and prompting for references (16.43 ± 2.05,  < 0.0001). Prompting for references generated 80/80 (100%) of referenced medical publications within answers. Seventy references (87.5%) were legitimate citations, and 58/80 (72.5%) provided accurately reported information from the referenced publication. ChatGPT overall provides appropriate answers to most questions on thyroid nodules regardless of prompting. Despite targeted prompting strategies, ChatGPT reliably generates responses corresponding to grade levels well-above accepted recommendations for presenting medical information to patients. Significant rates of AI hallucination may preclude clinicians from recommending the current version of ChatGPT as an educational tool for patients at this time.

摘要

ChatGPT 是一款人工智能(AI)聊天机器人,是历史上增长最快的消费者应用程序。鉴于最近的趋势表明,越来越多的患者使用互联网资源进行自我教育,我们试图评估 ChatGPT 生成的有关甲状腺结节患者教育的回复的质量。我们对 ChatGPT 进行了 4 次查询,每次查询提出了 30 个相同的问题。查询通过初始聊天机器人提示方式进行区分:不提示、患者友好型提示、8 年级水平提示和提示参考资料。答案根据分层评分进行评分:错误、部分正确、正确或附有参考资料。通过卡方分析比较不同提示类型的递增得分阈值的比例。为每个答案计算弗莱什-金凯德(Flesch-Kincaid)年级水平。使用方差分析评估提示类型与年级水平之间的关系。分析 ChatGPT 答案中提供的参考文献的真实性。在所有提示(=120 个问题)中,有 83 个答案(69.2%)至少是正确的。至少部分正确(=0.795)和正确(=0.402)的答案比例没有因提示而有所不同;附有参考文献的答案则不同(<0.0001)。8 年级水平提示的答案平均年级水平最低(13.43±2.86),明显低于无提示(14.97±2.01,=0.01)和提示参考资料(16.43±2.05,<0.0001)。提示参考资料生成了 80/80(100%)的回答中引用的医学出版物。70 个参考文献(87.5%)是合法引文,58/80(72.5%)提供了准确的信息来自参考文献的出版物。总体而言,ChatGPT 能够针对大多数甲状腺结节问题提供合适的答案,无论提示方式如何。尽管有针对性的提示策略,但 ChatGPT 可靠地生成的回复对应于远超向患者提供医学信息的可接受建议的年级水平。由于人工智能幻觉的高发生率,目前可能不建议临床医生将当前版本的 ChatGPT 作为患者的教育工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验