Suppr超能文献

耳鼻喉科语言模型的特定领域定制:ENT GPT助手

Domain-Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant.

作者信息

Bicknell Brenton T, Rivers Nicholas J, Skelton Adam, Sheehan Delaney, Hodges Charis, Fairburn Stevan C, Greene Benjamin J, Panuganti Bharat

机构信息

UAB Heersink School of Medicine University of Alabama at Birmingham Birmingham Alabama USA.

Department of Otolaryngology-Head and Neck Surgery University of Alabama at Birmingham Birmingham Alabama USA.

出版信息

OTO Open. 2025 May 5;9(2):e70125. doi: 10.1002/oto2.70125. eCollection 2025 Apr-Jun.

Abstract

OBJECTIVE

To develop and evaluate the effectiveness of domain-specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E-GPT-A), a model specifically tailored for otolaryngology.

STUDY DESIGN

Comparative analysis using multiple-choice questions (MCQs) from established otolaryngology resources.

SETTING

Tertiary care academic hospital.

METHODS

Two hundred forty clinical-vignette style MCQs were sourced from BoardVitals Otolaryngology and OTOQuest, covering a range of otolaryngology subspecialties (n = 40 for each). The E-GPT-A was developed using targeted instructions and customized to otolaryngology. The performance of E-GPT-A was compared against top-performing and widely used artificial intelligence (AI) LLMs, including GPT-3.5, GPT-4, Claude 2.0, and Claude 2.1. Accuracy was assessed across subspecialties, varying question difficulty tiers, and in diagnostics and management.

RESULTS

E-GPT-A achieved an overall accuracy of 74.6%, outperforming GPT-3.5 (60.4%), Claude 2.0 (61.7%), Claude 2.1 (60.8%), and GPT-4 (68.3%). The model performed best in allergy and rhinology (85.0%) and laryngology (82.5%), whereas showing lower accuracy in pediatrics (62.5%) and facial plastics/reconstructive surgery (67.5%). Accuracy also declined as question difficulty increased. The average correct response percentage among otolaryngologists and otolaryngology trainees was 71.1% in the question set.

CONCLUSION

This pilot study using the E-GPT-A demonstrates the potential benefits of domain-specific customizations of language models for otolaryngology. However, further development, continuous updates, and continued real-world validation are needed to fully assess the capabilities of LLMs in otolaryngology.

摘要

目的

通过评估专门为耳鼻喉科量身定制的ENT GPT助手(E-GPT-A)的性能,开发并评估大语言模型(LLMs)中特定领域定制的有效性。

研究设计

使用来自既定耳鼻喉科资源的多项选择题(MCQs)进行比较分析。

设置

三级医疗学术医院。

方法

从BoardVitals耳鼻喉科和OTOQuest获取了240道临床病例风格的MCQs,涵盖一系列耳鼻喉科亚专业(每个亚专业n = 40)。E-GPT-A通过有针对性的指令开发,并针对耳鼻喉科进行了定制。将E-GPT-A的性能与表现最佳且广泛使用的人工智能(AI)大语言模型进行比较,包括GPT-3.5、GPT-4、Claude 2.0和Claude 2.1。在各亚专业、不同难度层级的问题以及诊断和管理方面评估准确性。

结果

E-GPT-A的总体准确率达到74.6%,优于GPT-3.5(60.4%)、Claude 2.0(61.7%)、Claude 2.1(60.8%)和GPT-4(68.3%)。该模型在过敏与鼻科学(85.0%)和喉科学(82.5%)方面表现最佳,而在儿科(62.5%)和面部整形/重建手术(67.5%)方面准确率较低。随着问题难度增加,准确率也有所下降。在该问题集中,耳鼻喉科医生和耳鼻喉科实习生的平均正确回答百分比为71.1%。

结论

这项使用E-GPT-A的初步研究证明了针对耳鼻喉科对语言模型进行特定领域定制的潜在益处。然而,需要进一步开发、持续更新并持续进行实际验证,以全面评估大语言模型在耳鼻喉科的能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验