对ChatGPT、Gemini和Claude提供的隆鼻整形信息的可读性和准确性评估。

Evaluation of Rhinoplasty Information from ChatGPT, Gemini, and Claude for Readability and Accuracy.

作者信息

Meyer Monica K Rossi, Kandathil Cherian Kurian, Davis Seth J, Durairaj K Kay, Patel Priyesh N, Pepper Jon-Paul, Spataro Emily A, Most Sam P

机构信息

Division of Facial Plastic and Reconstructive Surgery, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, USA.

Department of Otolaryngology, Head and Neck Surgery, Huntington Hospital, Pasadena, California, USA.

出版信息

Aesthetic Plast Surg. 2025 Apr;49(7):1868-1873. doi: 10.1007/s00266-024-04343-0. Epub 2024 Sep 16.

DOI:10.1007/s00266-024-04343-0

PMID:39285054

Abstract

OBJECTIVE

Assessment of the readability, accuracy, quality, and completeness of ChatGPT (Open AI, San Francisco, CA), Gemini (Google, Mountain View, CA), and Claude (Anthropic, San Francisco, CA) responses to common questions about rhinoplasty.

METHODS

Ten questions commonly encountered in the senior author's (SPM) rhinoplasty practice were presented to ChatGPT-4, Gemini and Claude. Seven Facial Plastic and Reconstructive Surgeons with experience in rhinoplasty were asked to evaluate these responses for accuracy, quality, completeness, relevance, and use of medical jargon on a Likert scale. The responses were also evaluated using several readability indices.

RESULTS

ChatGPT achieved significantly higher evaluator scores for accuracy, and overall quality but scored significantly lower on completeness compared to Gemini and Claude. All three chatbot responses to the ten questions were rated as neutral to incomplete. All three chatbots were found to use medical jargon and scored at a college reading level for readability scores.

CONCLUSIONS

Rhinoplasty surgeons should be aware that the medical information found on chatbot platforms is incomplete and still needs to be scrutinized for accuracy. However, the technology does have potential for use in healthcare education by training it on evidence-based recommendations and improving readability.

LEVEL OF EVIDENCE V

This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

摘要

目的

评估ChatGPT（OpenAI，加利福尼亚州旧金山）、Gemini（谷歌，加利福尼亚州山景城）和Claude（Anthropic，加利福尼亚州旧金山）对隆鼻常见问题的回答的可读性、准确性、质量和完整性。

方法

向ChatGPT-4、Gemini和Claude提出资深作者（SPM）隆鼻实践中常见的10个问题。邀请7位有隆鼻经验的面部整形和重建外科医生，按照李克特量表对这些回答的准确性、质量、完整性、相关性和医学术语的使用进行评估。还使用了几个可读性指标对回答进行评估。

结果

与Gemini和Claude相比，ChatGPT在准确性和整体质量方面获得了显著更高的评估分数，但在完整性方面得分显著更低。三个聊天机器人对这10个问题的回答均被评为中性至不完整。发现所有三个聊天机器人都使用医学术语，可读性得分处于大学阅读水平。

结论

隆鼻外科医生应意识到，在聊天机器人平台上找到的医学信息是不完整的，仍需仔细审查其准确性。然而，通过基于循证推荐对其进行训练并提高可读性，这项技术在医疗保健教育中确实具有应用潜力。

证据水平V：本刊要求作者为每篇文章指定一个证据水平。有关这些循证医学评级的完整描述，请参阅目录或作者在线指南www.springer.com/00266 。

相似文献

Evaluation of Rhinoplasty Information from ChatGPT, Gemini, and Claude for Readability and Accuracy.对ChatGPT、Gemini和Claude提供的隆鼻整形信息的可读性和准确性评估。

Aesthetic Plast Surg. 2025 Apr;49(7):1868-1873. doi: 10.1007/s00266-024-04343-0. Epub 2024 Sep 16.

Evaluating the Quality and Readability of Generative Artificial Intelligence (AI) Chatbot Responses in the Management of Achilles Tendon Rupture.评估生成式人工智能（AI）聊天机器人在跟腱断裂管理中的回复质量和可读性。

Cureus. 2025 Jan 31;17(1):e78313. doi: 10.7759/cureus.78313. eCollection 2025 Jan.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试：ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析

Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.

Artificial intelligence chatbots versus traditional medical resources for patient education on "Labor Epidurals": an evaluation of accuracy, emotional tone, and readability.用于“分娩硬膜外麻醉”患者教育的人工智能聊天机器人与传统医学资源的比较：准确性、情感基调及可读性评估

Int J Obstet Anesth. 2025 Feb;61:104302. doi: 10.1016/j.ijoa.2024.104302. Epub 2024 Nov 26.

Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士：评估人工智能生成的关于阑尾炎的医学信息的内容和质量。

Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.

Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.人工智能能回答我的问题吗？腹部整形手术患者围手术期评估中人工智能的应用。

Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.

Advancement of Generative Pre-trained Transformer Chatbots in Answering Clinical Questions in the Practical Rhinoplasty Guideline.生成式预训练变换器聊天机器人在回答鼻整形实用指南中临床问题方面的进展

Aesthetic Plast Surg. 2025 Apr;49(7):1874-1880. doi: 10.1007/s00266-024-04377-4. Epub 2024 Sep 25.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Evaluating the Efficacy of Large Language Models in Generating Medical Documentation: A Comparative Study of ChatGPT-4, ChatGPT-4o, and Claude.评估大语言模型在生成医学文档方面的功效：ChatGPT-4、ChatGPT-4o和Claude的比较研究

Aesthetic Plast Surg. 2025 Apr 14. doi: 10.1007/s00266-025-04842-8.

引用本文的文献

Comparison of the readability of ChatGPT and Bard in medical communication: a meta-analysis.ChatGPT与Bard在医学交流中的可读性比较：一项荟萃分析。

BMC Med Inform Decis Mak. 2025 Sep 1;25(1):325. doi: 10.1186/s12911-025-03035-2.

Evaluating tonsillectomy-related YouTube videos via a human expert review and the ChatGPT-4: a multi-method quality analysis.通过专家评审和ChatGPT-4评估与扁桃体切除术相关的YouTube视频：多方法质量分析

BMC Med Educ. 2025 Aug 11;25(1):1157. doi: 10.1186/s12909-025-07739-x.

Evaluation of large language models as a diagnostic tool for medical learners and clinicians using advanced prompting techniques.使用先进提示技术评估大型语言模型作为医学学习者和临床医生的诊断工具。

PLoS One. 2025 Aug 1;20(8):e0325803. doi: 10.1371/journal.pone.0325803. eCollection 2025.

Are chatbots a reliable source for patient frequently asked questions on neck masses?聊天机器人是患者关于颈部肿块常见问题的可靠信息来源吗？

Eur Arch Otorhinolaryngol. 2025 Apr 30. doi: 10.1007/s00405-025-09433-6.

Assessing large language models as assistive tools in medical consultations for Kawasaki disease.评估大型语言模型作为川崎病医疗咨询辅助工具的作用。

Front Artif Intell. 2025 Mar 31;8:1571503. doi: 10.3389/frai.2025.1571503. eCollection 2025.

Evaluation of Large Language Models' Concordance With Guidelines on Olfaction.评估大型语言模型与嗅觉指南的一致性。

Laryngoscope Investig Otolaryngol. 2025 Mar 22;10(2):e70130. doi: 10.1002/lio2.70130. eCollection 2025 Apr.

本文引用的文献

Apaydin Classification of Spreader Flaps Updated.

Facial Plast Surg. 2025 Jun;41(3):286-293. doi: 10.1055/a-2302-9556. Epub 2024 Apr 10.

ChatGPT and Rhinoplasty Recovery: An Exploration of AI's Role in Postoperative Guidance.ChatGPT 和鼻整形术恢复：人工智能在术后指导中的作用探索。

Facial Plast Surg. 2024 Oct;40(5):628-631. doi: 10.1055/a-2219-4901. Epub 2023 Nov 29.

Effectiveness of ChatGPT in Identifying and Accurately Guiding Patients in Rhinoplasty Complications.ChatGPT 在识别和准确指导鼻整形并发症患者方面的有效性。

Facial Plast Surg. 2024 Oct;40(5):623-627. doi: 10.1055/a-2218-6984. Epub 2023 Nov 28.

Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT "Wins" Rhinoplasty Consultations: Should We Be Worried?人工智能与专家整形外科医生：比较研究显示 ChatGPT“胜出”鼻整形咨询：我们是否应该担心？

Facial Plast Surg Aesthet Med. 2024 May-Jun;26(3):270-275. doi: 10.1089/fpsam.2023.0224. Epub 2023 Nov 20.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

A new readability yardstick.一种新的可读性衡量标准。

J Appl Psychol. 1948 Jun;32(3):221-33. doi: 10.1037/h0057532.

Readability assessment of internet-based consumer health information.基于互联网的消费者健康信息的可读性评估。

Respir Care. 2008 Oct;53(10):1310-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对ChatGPT、Gemini和Claude提供的隆鼻整形信息的可读性和准确性评估。

Evaluation of Rhinoplasty Information from ChatGPT, Gemini, and Claude for Readability and Accuracy.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

LEVEL OF EVIDENCE V

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献