ChatGPT 在不同耳鼻喉科亚专业中的测验技能：对 2576 道选择题和多选题进行 board certification 准备的分析。

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.

机构信息

Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany.

Department of Otorhinolaryngology, Head and Neck Surgery, Medical Faculty, University of Cologne, 50937, Cologne, Germany.

出版信息

Eur Arch Otorhinolaryngol. 2023 Sep;280(9):4271-4278. doi: 10.1007/s00405-023-08051-4. Epub 2023 Jun 7.

DOI:10.1007/s00405-023-08051-4

PMID:37285018

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10382366/

Abstract

PURPOSE

With the increasing adoption of artificial intelligence (AI) in various domains, including healthcare, there is growing acceptance and interest in consulting AI models to provide medical information and advice. This study aimed to evaluate the accuracy of ChatGPT's responses to practice quiz questions designed for otolaryngology board certification and decipher potential performance disparities across different otolaryngology subspecialties.

METHODS

A dataset covering 15 otolaryngology subspecialties was collected from an online learning platform funded by the German Society of Oto-Rhino-Laryngology, Head and Neck Surgery, designed for board certification examination preparation. These questions were entered into ChatGPT, with its responses being analyzed for accuracy and variance in performance.

RESULTS

The dataset included 2576 questions (479 multiple-choice and 2097 single-choice), of which 57% (n = 1475) were answered correctly by ChatGPT. An in-depth analysis of question style revealed that single-choice questions were associated with a significantly higher rate (p < 0.001) of correct responses (n = 1313; 63%) compared to multiple-choice questions (n = 162; 34%). Stratified by question categories, ChatGPT yielded the highest rate of correct responses (n = 151; 72%) in the field of allergology, whereas 7 out of 10 questions (n = 65; 71%) on legal otolaryngology aspects were answered incorrectly.

CONCLUSION

The study reveals ChatGPT's potential as a supplementary tool for otolaryngology board certification preparation. However, its propensity for errors in certain otolaryngology areas calls for further refinement. Future research should address these limitations to improve ChatGPT's educational use. An approach, with expert collaboration, is recommended for the reliable and accurate integration of such AI models.

摘要

目的

随着人工智能（AI）在各个领域的广泛应用，包括医疗保健领域，人们越来越接受并热衷于咨询 AI 模型以提供医学信息和建议。本研究旨在评估 ChatGPT 对耳鼻喉科委员会认证实践测验问题的回答的准确性，并揭示不同耳鼻喉科亚专业之间潜在的表现差异。

方法

从一个由德国耳鼻喉科协会资助的在线学习平台上收集了涵盖 15 个耳鼻喉科亚专业的数据集，该数据集专为委员会认证考试准备而设计。将这些问题输入到 ChatGPT 中，分析其回答的准确性和表现差异。

结果

该数据集包括 2576 个问题（479 个多项选择和 2097 个单项选择），其中 57%（n=1475）被 ChatGPT 回答正确。对问题风格的深入分析表明，单项选择问题与更高的正确回答率（p<0.001）显著相关（n=1313；63%），相比之下，多项选择问题（n=162；34%）的正确回答率较低。按问题类别分层，ChatGPT 在变态反应学领域产生了最高的正确回答率（n=151；72%），而在法律耳鼻喉科方面的 10 个问题中有 7 个（n=65；71%）回答错误。

结论

本研究揭示了 ChatGPT 在耳鼻喉科委员会认证准备中的潜在作用。然而，它在某些耳鼻喉科领域的错误倾向需要进一步改进。未来的研究应该解决这些限制，以提高 ChatGPT 的教育用途。建议采用专家合作的方法，以可靠和准确地整合此类 AI 模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1690/10382366/15ddde19c19a/405_2023_8051_Fig1_HTML.jpg

相似文献

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.ChatGPT 在不同耳鼻喉科亚专业中的测验技能：对 2576 道选择题和多选题进行 board certification 准备的分析。

Eur Arch Otorhinolaryngol. 2023 Sep;280(9):4271-4278. doi: 10.1007/s00405-023-08051-4. Epub 2023 Jun 7.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响：来自台湾护理执照考试的见解。

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification?ChatGPT 与临床信息学 board 考试：无需监管的认证维持的终结？

J Am Med Inform Assoc. 2023 Aug 18;30(9):1558-1560. doi: 10.1093/jamia/ocad104.

A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.一种评估 ChatGPT 在耳鼻喉头颈外科认证考试中表现的新评价模型：性能研究。

JMIR Med Educ. 2024 Jan 16;10:e49970. doi: 10.2196/49970.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现：调查研究。

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Assessing ChatGPT's Responses to Otolaryngology Patient Questions.评估 ChatGPT 对耳鼻喉科患者问题的回答。

Ann Otol Rhinol Laryngol. 2024 Jul;133(7):658-664. doi: 10.1177/00034894241249621. Epub 2024 Apr 27.

Assessment of ChatGPT's performance on neurology written board examination questions.ChatGPT在神经病学笔试问题上的表现评估。

BMJ Neurol Open. 2023 Nov 2;5(2):e000530. doi: 10.1136/bmjno-2023-000530. eCollection 2023.

Evaluating ChatGPT's Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry.评估ChatGPT解决医学基础生物化学基于能力的医学教育课程中高阶问题的能力。

Cureus. 2023 Apr 2;15(4):e37023. doi: 10.7759/cureus.37023. eCollection 2023 Apr.

引用本文的文献

Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis.100篇顶级文章的识别与分类以及大语言模型的未来：基于文献计量分析的主题分析

JMIR AI. 2025 Aug 27;4:e68603. doi: 10.2196/68603.

ChatGPT as an information source for cochlear implants: A study on accuracy and reproducibility.ChatGPT作为人工耳蜗植入的信息来源：准确性和可重复性研究。

Eur Arch Otorhinolaryngol. 2025 Jul 23. doi: 10.1007/s00405-025-09565-9.

Ten tips to harnessing generative AI for high-quality MCQS in medical education assessment.在医学教育评估中利用生成式人工智能生成高质量多项选择题的十条建议。

Med Educ Online. 2025 Dec;30(1):2532682. doi: 10.1080/10872981.2025.2532682. Epub 2025 Jul 17.

ChatGPT versus DeepSeek in head and neck cancer staging and treatment planning: guideline-based study.ChatGPT与DeepSeek在头颈癌分期及治疗规划中的比较：基于指南的研究

Eur Arch Otorhinolaryngol. 2025 Jun 17. doi: 10.1007/s00405-025-09524-4.

Clinical decision support using large language models in otolaryngology: a systematic review.耳鼻喉科中使用大语言模型的临床决策支持：一项系统综述。

Eur Arch Otorhinolaryngol. 2025 Jun 6. doi: 10.1007/s00405-025-09504-8.

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.评估大型语言模型回答临床问题的可信度：横断面评估研究

JMIR Med Inform. 2025 May 16;13:e66917. doi: 10.2196/66917.

Applications of Natural Language Processing in Otolaryngology: A Scoping Review.自然语言处理在耳鼻咽喉科的应用：一项范围综述

Laryngoscope. 2025 Sep;135(9):3049-3063. doi: 10.1002/lary.32198. Epub 2025 May 1.

Harnessing advanced large language models in otolaryngology board examinations: an investigation using python and application programming interfaces.在耳鼻喉科委员会考试中利用先进的大语言模型：使用Python和应用程序编程接口的调查

Eur Arch Otorhinolaryngol. 2025 Apr 25. doi: 10.1007/s00405-025-09404-x.

An Evaluation of Current Trends in AI-Generated Text in Otolaryngology Publications.耳鼻喉科出版物中人工智能生成文本的当前趋势评估。

Laryngoscope. 2025 Apr 25. doi: 10.1002/lary.32202.

[Artificial intelligence applications in Ménière's disease].[人工智能在梅尼埃病中的应用]

Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi. 2025 May;39(5):496-500. doi: 10.13201/j.issn.2096-7993.2025.05.020.

本文引用的文献

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

[Otolaryngology-knowledge among ORL-physicians: An analysis of the quiz questions in the ORL-App].[耳鼻喉科医生的耳鼻喉科知识：对耳鼻喉科应用程序中测验问题的分析]

Laryngorhinootologie. 2023 Oct;102(10):762-769. doi: 10.1055/a-2036-7843. Epub 2023 Mar 28.

The exciting potential for ChatGPT in obstetrics and gynecology.ChatGPT 在妇产科领域的令人兴奋的潜力。

Am J Obstet Gynecol. 2023 Jun;228(6):696-705. doi: 10.1016/j.ajog.2023.03.009. Epub 2023 Mar 15.

ChatGPT in Clinical Toxicology.临床毒理学中的ChatGPT

JMIR Med Educ. 2023 Mar 8;9:e46876. doi: 10.2196/46876.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study.ChatGPT 的知识和解释能力与韩国医学生在寄生虫学考试中的表现相当吗？一项描述性研究。

J Educ Eval Health Prof. 2023;20:1. doi: 10.3352/jeehp.2023.20.1. Epub 2023 Jan 11.

A machine learning approach to high-risk cardiac surgery risk scoring.一种用于高风险心脏手术风险评分的机器学习方法。

J Card Surg. 2022 Dec;37(12):4612-4620. doi: 10.1111/jocs.17110. Epub 2022 Nov 8.

A Ready-to-Use Grading Tool for Facial Palsy Examiners-Automated Grading System in Facial Palsy Patients Made Easy.面向面瘫检查者的即用型分级工具——让面瘫患者的自动分级系统变得轻松。

J Pers Med. 2022 Oct 19;12(10):1739. doi: 10.3390/jpm12101739.

Towards a Reliable and Rapid Automated Grading System in Facial Palsy Patients: Facial Palsy Surgery Meets Computer Science.迈向可靠且快速的面瘫患者自动分级系统：面瘫手术与计算机科学的结合。

J Clin Med. 2022 Aug 25;11(17):4998. doi: 10.3390/jcm11174998.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT 在不同耳鼻喉科亚专业中的测验技能：对 2576 道选择题和多选题进行 board certification 准备的分析。

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献