评估 ChatGPT 对甲状腺结节患者教育的回复。

Evaluating ChatGPT Responses on Thyroid Nodules for Patient Education.

机构信息

Department of Otolaryngology-Head and Neck Surgery, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, USA.

出版信息

Thyroid. 2024 Mar;34(3):371-377. doi: 10.1089/thy.2023.0491. Epub 2023 Dec 26.

DOI:10.1089/thy.2023.0491

PMID:38010917

Abstract

ChatGPT, an artificial intelligence (AI) chatbot, is the fastest growing consumer application in history. Given recent trends identifying increasing patient use of Internet sources for self-education, we seek to evaluate the quality of ChatGPT-generated responses for patient education on thyroid nodules. ChatGPT was queried 4 times with 30 identical questions. Queries differed by initial chatbot prompting: no prompting, patient-friendly prompting, 8th-grade level prompting, and prompting for references. Answers were scored on a hierarchical score: incorrect, partially correct, correct, or correct with references. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. Flesch-Kincaid grade level was calculated for each answer. The relationship between prompt type and grade level was assessed using analysis of variance. References provided within ChatGPT answers were totaled and analyzed for veracity. Across all prompts ( = 120 questions), 83 answers (69.2%) were at least correct. Proportions of responses that were at least partially correct ( = 0.795) and correct ( = 0.402) did not differ by prompt; responses that were correct with references did ( < 0.0001). Responses from 8th-grade level prompting were the lowest mean grade level (13.43 ± 2.86) and were significantly lower than no prompting (14.97 ± 2.01, = 0.01) and prompting for references (16.43 ± 2.05, < 0.0001). Prompting for references generated 80/80 (100%) of referenced medical publications within answers. Seventy references (87.5%) were legitimate citations, and 58/80 (72.5%) provided accurately reported information from the referenced publication. ChatGPT overall provides appropriate answers to most questions on thyroid nodules regardless of prompting. Despite targeted prompting strategies, ChatGPT reliably generates responses corresponding to grade levels well-above accepted recommendations for presenting medical information to patients. Significant rates of AI hallucination may preclude clinicians from recommending the current version of ChatGPT as an educational tool for patients at this time.

摘要

ChatGPT 是一款人工智能（AI）聊天机器人，是历史上增长最快的消费者应用程序。鉴于最近的趋势表明，越来越多的患者使用互联网资源进行自我教育，我们试图评估 ChatGPT 生成的有关甲状腺结节患者教育的回复的质量。我们对 ChatGPT 进行了 4 次查询，每次查询提出了 30 个相同的问题。查询通过初始聊天机器人提示方式进行区分：不提示、患者友好型提示、8 年级水平提示和提示参考资料。答案根据分层评分进行评分：错误、部分正确、正确或附有参考资料。通过卡方分析比较不同提示类型的递增得分阈值的比例。为每个答案计算弗莱什-金凯德（Flesch-Kincaid）年级水平。使用方差分析评估提示类型与年级水平之间的关系。分析 ChatGPT 答案中提供的参考文献的真实性。在所有提示（=120 个问题）中，有 83 个答案（69.2%）至少是正确的。至少部分正确（=0.795）和正确（=0.402）的答案比例没有因提示而有所不同；附有参考文献的答案则不同（<0.0001）。8 年级水平提示的答案平均年级水平最低（13.43±2.86），明显低于无提示（14.97±2.01，=0.01）和提示参考资料（16.43±2.05，<0.0001）。提示参考资料生成了 80/80（100%）的回答中引用的医学出版物。70 个参考文献（87.5%）是合法引文，58/80（72.5%）提供了准确的信息来自参考文献的出版物。总体而言，ChatGPT 能够针对大多数甲状腺结节问题提供合适的答案，无论提示方式如何。尽管有针对性的提示策略，但 ChatGPT 可靠地生成的回复对应于远超向患者提供医学信息的可接受建议的年级水平。由于人工智能幻觉的高发生率，目前可能不建议临床医生将当前版本的 ChatGPT 作为患者的教育工具。

相似文献

Evaluating ChatGPT Responses on Thyroid Nodules for Patient Education.评估 ChatGPT 对甲状腺结节患者教育的回复。

Thyroid. 2024 Mar;34(3):371-377. doi: 10.1089/thy.2023.0491. Epub 2023 Dec 26.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Evaluating ChatGPT responses on obstructive sleep apnea for patient education.评估 ChatGPT 对阻塞性睡眠呼吸暂停患者教育的回复。

J Clin Sleep Med. 2023 Dec 1;19(12):1989-1995. doi: 10.5664/jcsm.10728.

Can Artificial Intelligence Improve the Readability of Patient Education Materials?人工智能能否提高患者教育材料的可读性？

Clin Orthop Relat Res. 2023 Nov 1;481(11):2260-2267. doi: 10.1097/CORR.0000000000002668. Epub 2023 Apr 28.

Adefovir dipivoxil and pegylated interferon alfa-2a for the treatment of chronic hepatitis B: a systematic review and economic evaluation.阿德福韦酯与聚乙二醇化干扰素α-2a治疗慢性乙型肝炎：系统评价与经济学评估

Health Technol Assess. 2006 Aug;10(28):iii-iv, xi-xiv, 1-183. doi: 10.3310/hta10280.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤：系统评价与经济学评估

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

Sertindole for schizophrenia.用于治疗精神分裂症的舍吲哚。

Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.

引用本文的文献

Large language model integrations in cancer decision-making: a systematic review and meta-analysis.大型语言模型在癌症决策中的应用：一项系统综述和荟萃分析。

NPJ Digit Med. 2025 Jul 17;8(1):450. doi: 10.1038/s41746-025-01824-7.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Artificial intelligence-assisted chatbot: impact on breastfeeding outcomes and maternal anxiety.人工智能辅助聊天机器人：对母乳喂养结果及产妇焦虑的影响

BMC Pregnancy Childbirth. 2025 May 30;25(1):631. doi: 10.1186/s12884-025-07753-3.

Multimodal GPT model for assisting thyroid nodule diagnosis and management.用于辅助甲状腺结节诊断和管理的多模态GPT模型。

NPJ Digit Med. 2025 May 3;8(1):245. doi: 10.1038/s41746-025-01652-9.

Applications of Natural Language Processing in Otolaryngology: A Scoping Review.自然语言处理在耳鼻咽喉科的应用：一项范围综述

Laryngoscope. 2025 Sep;135(9):3049-3063. doi: 10.1002/lary.32198. Epub 2025 May 1.

Appropriateness of Thyroid Nodule Cancer Risk Assessment and Management Recommendations Provided by Large Language Models.大语言模型提供的甲状腺结节癌症风险评估及管理建议的适宜性

J Imaging Inform Med. 2025 Mar 3. doi: 10.1007/s10278-025-01454-1.

Assessing ChatGPT's Capability in Addressing Thyroid Cancer Patient Queries: A Comprehensive Mixed-Methods Evaluation.评估ChatGPT在解答甲状腺癌患者问题方面的能力：一项全面的混合方法评估。

J Endocr Soc. 2025 Jan 13;9(2):bvaf003. doi: 10.1210/jendso/bvaf003. eCollection 2025 Jan 6.

Comparing the performance of ChatGPT and ERNIE Bot in answering questions regarding liver cancer interventional radiology in Chinese and English contexts: A comparative study.比较ChatGPT和文心一言在中英文语境下回答肝癌介入放射学相关问题的性能：一项比较研究。

Digit Health. 2025 Jan 23;11:20552076251315511. doi: 10.1177/20552076251315511. eCollection 2025 Jan-Dec.

Current applications and challenges in large language models for patient care: a systematic review.用于患者护理的大语言模型的当前应用与挑战：一项系统综述

Commun Med (Lond). 2025 Jan 21;5(1):26. doi: 10.1038/s43856-024-00717-2.

Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery.增强医疗保健中的人工智能聊天机器人回复：头颈外科的SMART提示结构

OTO Open. 2025 Jan 16;9(1):e70075. doi: 10.1002/oto2.70075. eCollection 2025 Jan-Mar.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估 ChatGPT 对甲状腺结节患者教育的回复。

Evaluating ChatGPT Responses on Thyroid Nodules for Patient Education.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献