耳鼻喉科语言模型的特定领域定制：ENT GPT助手

Domain-Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant.

作者信息

Bicknell Brenton T, Rivers Nicholas J, Skelton Adam, Sheehan Delaney, Hodges Charis, Fairburn Stevan C, Greene Benjamin J, Panuganti Bharat

机构信息

UAB Heersink School of Medicine University of Alabama at Birmingham Birmingham Alabama USA.

Department of Otolaryngology-Head and Neck Surgery University of Alabama at Birmingham Birmingham Alabama USA.

出版信息

OTO Open. 2025 May 5;9(2):e70125. doi: 10.1002/oto2.70125. eCollection 2025 Apr-Jun.

DOI:10.1002/oto2.70125

PMID:40331108

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12051367/

Abstract

OBJECTIVE

To develop and evaluate the effectiveness of domain-specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E-GPT-A), a model specifically tailored for otolaryngology.

STUDY DESIGN

Comparative analysis using multiple-choice questions (MCQs) from established otolaryngology resources.

SETTING

Tertiary care academic hospital.

METHODS

Two hundred forty clinical-vignette style MCQs were sourced from BoardVitals Otolaryngology and OTOQuest, covering a range of otolaryngology subspecialties (n = 40 for each). The E-GPT-A was developed using targeted instructions and customized to otolaryngology. The performance of E-GPT-A was compared against top-performing and widely used artificial intelligence (AI) LLMs, including GPT-3.5, GPT-4, Claude 2.0, and Claude 2.1. Accuracy was assessed across subspecialties, varying question difficulty tiers, and in diagnostics and management.

RESULTS

E-GPT-A achieved an overall accuracy of 74.6%, outperforming GPT-3.5 (60.4%), Claude 2.0 (61.7%), Claude 2.1 (60.8%), and GPT-4 (68.3%). The model performed best in allergy and rhinology (85.0%) and laryngology (82.5%), whereas showing lower accuracy in pediatrics (62.5%) and facial plastics/reconstructive surgery (67.5%). Accuracy also declined as question difficulty increased. The average correct response percentage among otolaryngologists and otolaryngology trainees was 71.1% in the question set.

CONCLUSION

This pilot study using the E-GPT-A demonstrates the potential benefits of domain-specific customizations of language models for otolaryngology. However, further development, continuous updates, and continued real-world validation are needed to fully assess the capabilities of LLMs in otolaryngology.

摘要

目的

通过评估专门为耳鼻喉科量身定制的ENT GPT助手（E-GPT-A）的性能，开发并评估大语言模型（LLMs）中特定领域定制的有效性。

研究设计

使用来自既定耳鼻喉科资源的多项选择题（MCQs）进行比较分析。

设置

三级医疗学术医院。

方法

从BoardVitals耳鼻喉科和OTOQuest获取了240道临床病例风格的MCQs，涵盖一系列耳鼻喉科亚专业（每个亚专业n = 40）。E-GPT-A通过有针对性的指令开发，并针对耳鼻喉科进行了定制。将E-GPT-A的性能与表现最佳且广泛使用的人工智能（AI）大语言模型进行比较，包括GPT-3.5、GPT-4、Claude 2.0和Claude 2.1。在各亚专业、不同难度层级的问题以及诊断和管理方面评估准确性。

结果

E-GPT-A的总体准确率达到74.6%，优于GPT-3.5（60.4%）、Claude 2.0（61.7%）、Claude 2.1（60.8%）和GPT-4（68.3%）。该模型在过敏与鼻科学（85.0%）和喉科学（82.5%）方面表现最佳，而在儿科（62.5%）和面部整形/重建手术（67.5%）方面准确率较低。随着问题难度增加，准确率也有所下降。在该问题集中，耳鼻喉科医生和耳鼻喉科实习生的平均正确回答百分比为71.1%。

结论

这项使用E-GPT-A的初步研究证明了针对耳鼻喉科对语言模型进行特定领域定制的潜在益处。然而，需要进一步开发、持续更新并持续进行实际验证，以全面评估大语言模型在耳鼻喉科的能力。

相似文献

Domain-Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant.

OTO Open. 2025 May 5;9(2):e70125. doi: 10.1002/oto2.70125. eCollection 2025 Apr-Jun.

Comparison of ChatGPT-4, Copilot, Bard and Gemini Ultra on an Otolaryngology Question Bank.

Clin Otolaryngol. 2025 Jul;50(4):704-711. doi: 10.1111/coa.14302. Epub 2025 Mar 13.

Harnessing advanced large language models in otolaryngology board examinations: an investigation using python and application programming interfaces.

Eur Arch Otorhinolaryngol. 2025 Apr 25. doi: 10.1007/s00405-025-09404-x.

Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.

JMIR Med Educ. 2024 Mar 28;10:e57054. doi: 10.2196/57054.

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.

JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.

Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.

JMIR Med Educ. 2025 Apr 10;11:e67244. doi: 10.2196/67244.

Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.

Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar.

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study.

JMIR Med Educ. 2025 Jan 13;11:e58898. doi: 10.2196/58898.

Stratified Evaluation of GPT's Question Answering in Surgery Reveals Artificial Intelligence (AI) Knowledge Gaps.

Cureus. 2023 Nov 14;15(11):e48788. doi: 10.7759/cureus.48788. eCollection 2023 Nov.

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment.

Indian J Otolaryngol Head Neck Surg. 2024 Dec;76(6):6112-6114. doi: 10.1007/s12070-024-04935-x. Epub 2024 Aug 3.

引用本文的文献

Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.

Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.

本文引用的文献

ChatENT: Augmented Large Language Model for Expert Knowledge Retrieval in Otolaryngology-Head and Neck Surgery.

Otolaryngol Head Neck Surg. 2024 Oct;171(4):1042-1051. doi: 10.1002/ohn.864. Epub 2024 Jun 19.

Assessing the quality of artificial intelligence-generated patient counseling for rhinosinusitis.

Int Forum Allergy Rhinol. 2024 Oct;14(10):1634-1637. doi: 10.1002/alr.23387. Epub 2024 Jun 18.

Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis.

Otolaryngol Head Neck Surg. 2024 Aug;171(2):571-577. doi: 10.1002/ohn.832. Epub 2024 May 26.

Oropharyngeal Cancer Staging Health Record Extraction Using Artificial Intelligence.

JAMA Otolaryngol Head Neck Surg. 2024 Dec 1;150(12):1051-1057. doi: 10.1001/jamaoto.2024.1201.

Evaluating the Current Ability of ChatGPT to Assist in Professional Otolaryngology Education.

OTO Open. 2023 Nov 22;7(4):e94. doi: 10.1002/oto2.94. eCollection 2023 Oct-Dec.

The future landscape of large language models in medicine.

Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

耳鼻喉科语言模型的特定领域定制：ENT GPT助手

Domain-Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant.

作者信息

机构信息

出版信息

OBJECTIVE

STUDY DESIGN

SETTING

METHODS

RESULTS

CONCLUSION

目的

研究设计

设置

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献