Booth Gregory J, Hauert Thomas, Mynes Mike, Hodgson John, Slama Elizabeth, Goldman Ashton, Moore Jeffrey
The following authors are in both the Department of Anesthesiology, Uniformed Services University, Bethesda, MD, and Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, VA: Gregory J. Booth is an Associate Professor at Uniformed Services University and Program Director, Anesthesiology Residency at Naval Medical Center Portsmouth; Mike Mynes and Elizabeth Slama are Assistant Professors at Uniformed Services University and Staff Anesthesiologists at Naval Medical Center Portsmouth; Jeffrey Moore is an Assistant Professor at Uniformed Services University and Program Director, Pain Medicine Fellowship, and Associate Designated Institutional Official at Naval Medical Center Portsmouth. Thomas Hauert is an Anesthesiology Resident Physician at Naval Medical Center Portsmouth, Portsmouth, VA. Ashton Goldman is an Associate Professor at Uniformed Services University, Bethesda, MD, and a Staff Orthopedic Surgeon at the Department of Orthopedic Surgery and Sports Medicine at Naval Medical Center Portsmouth, Portsmouth, VA. John Hodgson is an Associate Professor and Program Director, Anesthesiology Residency at University of South Florida, Tampa, FL.
J Educ Perioper Med. 2024 Sep 30;26(3):E729. doi: 10.46374/VolXXVI_Issue3_Moore. eCollection 2024 Jul-Sep.
Natural language processing is a collection of techniques designed to empower computer systems to comprehend and/or produce human language. The purpose of this investigation was to train several large language models (LLMs) to explore the tradeoff between model complexity and performance while classifying narrative feedback on trainees into the Accreditation Council for Graduate Medical Education subcompetencies. We hypothesized that classification accuracy would increase with model complexity.
The authors fine-tuned several transformer-based LLMs (Bidirectional Encoder Representations from Transformers [BERT]-base, BERT-medium, BERT-small, BERT-mini, BERT-tiny, and SciBERT) to predict Accreditation Council for Graduate Medical Education subcompetencies on a curated dataset of 10 218 feedback comments. Performance was compared with the authors' previous work, which trained a FastText model on the same dataset. Performance metrics included F1 score for global model performance and area under the receiver operating characteristic curve for each competency.
No models were superior to FastText. Only BERT-tiny performed worse than FastText. The smallest model with comparable performance to FastText, BERT-mini, was 94% smaller. Area under the receiver operating characteristic curve for each competency was similar on BERT-mini and FastText with the exceptions of Patient Care 7 (Situational Awareness and Crisis Management) and Systems-Based Practice.
Transformer-based LLMs were fine-tuned to understand anesthesiology graduate medical education language. Complex LLMs did not outperform FastText. However, equivalent performance was achieved with a model that was 94% smaller, which may allow model deployment on personal devices to enhance speed and data privacy. This work advances our understanding of best practices when integrating LLMs into graduate medical education.
自然语言处理是一系列旨在使计算机系统能够理解和/或生成人类语言的技术。本研究的目的是训练几个大语言模型(LLMs),以探索在将实习生的叙事反馈分类为研究生医学教育认证委员会子能力时,模型复杂性与性能之间的权衡。我们假设分类准确率会随着模型复杂性的增加而提高。
作者对几个基于Transformer的大语言模型(来自Transformer的双向编码器表示[BERT]-基础版、BERT-中型版、BERT-小型版、BERT-微型版、BERT-超微型版和SciBERT)进行了微调,以在一个由10218条反馈评论组成的精选数据集上预测研究生医学教育认证委员会的子能力。将性能与作者之前的工作进行了比较,之前的工作是在同一数据集上训练了一个FastText模型。性能指标包括全局模型性能的F1分数以及每个能力的受试者工作特征曲线下面积。
没有模型优于FastText。只有BERT-超微型版的表现比FastText差。与FastText性能相当的最小模型BERT-微型版,其规模小了94%。除了患者护理7(情境意识和危机管理)和基于系统的实践外,BERT-微型版和FastText上每个能力的受试者工作特征曲线下面积相似。
对基于Transformer的大语言模型进行了微调,以理解麻醉学研究生医学教育语言。复杂的大语言模型并没有超过FastText。然而,使用一个规模小94%的模型实现了同等性能,这可能允许在个人设备上部署模型,以提高速度和数据隐私性。这项工作推进了我们对将大语言模型整合到研究生医学教育中的最佳实践的理解。