人工智能驱动的外科口腔检查模拟器的开发与评估：一项试点研究。

Development and Evaluation of an Artificial Intelligence-Powered Surgical Oral Examination Simulator: A Pilot Study.

作者信息

Rao Arya S, Prasad Siona, Lee Richard S, Farrell Susan, McKinley Sophia, Succi Marc D

机构信息

Harvard Medical School, Boston, MA.

Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Mass General Brigham, Boston, MA.

出版信息

Mayo Clin Proc Digit Health. 2025 Jun 9;3(3):100241. doi: 10.1016/j.mcpdig.2025.100241. eCollection 2025 Sep.

DOI:10.1016/j.mcpdig.2025.100241

PMID:40677929

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12270061/

Abstract

OBJECTIVE

To develop and validate an artificial intelligence-powered platform that simulates surgical oral examinations, addressing the limitations of traditional faculty-led sessions.

PATIENTS AND METHODS

This cross-sectional study, conducted from June 1, 2024, through December 1, 2024, comprised technical validation and educational assessment of a novel large language model (LLM)-based surgical education tool (surgery oral examination large language model [SOE-LLM]). The study involved 12 surgical clerkship students completing their core rotation at a major academic medical center. The SOE-LLM, using MIMIC-IV-derived surgical cases (acute appendicitis and pancreatitis), was implemented to simulate oral examinations. Technical validation assessed performance across 8 domains: case presentation accuracy, physical examination findings, historical detail preservation, laboratory data reporting, imaging interpretation, management decisions, and recognition of contraindicated interventions. Educational utility was evaluated using a 5-point Likert scale.

RESULTS

Technical validation showed the SOE-LLM's ability to function as a consistent oral examiner. The model accurately guided students through case presentations, responded to diagnostic questions, and provided clinically sound responses based on MIMIC-IV cases. When tested with standardized prompts, it maintained examination fidelity, requiring proper diagnostic reasoning and differentiating operative versus medical management. Student evaluations highlighted the platform's value as an examination preparation tool (mean, 4.250; SEM, 0.1794) and its ability to create a low-stakes environment for high-stakes decision practice (mean, 4.833; SEM, 0.1124).

CONCLUSION

The SOE-LLM shows potential as a valuable tool for surgical education, offering a consistent and accessible platform for simulating oral examinations.

摘要

目的

开发并验证一个由人工智能驱动的模拟外科口腔检查的平台，以解决传统教师主导课程的局限性。

患者与方法

这项横断面研究于2024年6月1日至2024年12月1日进行，包括对一种基于新型大语言模型（LLM）的外科教育工具（外科口腔检查大语言模型[SOE-LLM]）进行技术验证和教育评估。该研究涉及12名在一家主要学术医疗中心完成核心轮转的外科实习学生。使用源自MIMIC-IV的外科病例（急性阑尾炎和胰腺炎）的SOE-LLM被用于模拟口腔检查。技术验证评估了8个领域的表现：病例呈现准确性、体格检查结果、病史细节保留、实验室数据报告、影像解读、管理决策以及对禁忌干预的识别。教育效用使用5点李克特量表进行评估。

结果

技术验证表明SOE-LLM具备作为一致的口腔考官发挥作用的能力。该模型准确地引导学生进行病例呈现，回答诊断问题，并基于MIMIC-IV病例提供临床合理的回答。当使用标准化提示进行测试时，它保持了检查的保真度，要求进行适当的诊断推理并区分手术与药物管理。学生评价突出了该平台作为考试准备工具的价值（均值为4.250；标准误为0.1794）以及其为高风险决策实践创造低风险环境的能力（均值为4.833；标准误为0.1124）。

结论

SOE-LLM显示出作为外科教育有价值工具的潜力，为模拟口腔检查提供了一个一致且可及的平台。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2719/12270061/11875f2deac3/gr1.jpg

相似文献

Development and Evaluation of an Artificial Intelligence-Powered Surgical Oral Examination Simulator: A Pilot Study.人工智能驱动的外科口腔检查模拟器的开发与评估：一项试点研究。

Mayo Clin Proc Digit Health. 2025 Jun 9;3(3):100241. doi: 10.1016/j.mcpdig.2025.100241. eCollection 2025 Sep.

Development of a GPT-4-Powered Virtual Simulated Patient and Communication Training Platform for Medical Students to Practice Discussing Abnormal Mammogram Results With Patients: Multiphase Study.开发一个由GPT-4驱动的虚拟模拟患者和沟通训练平台，供医学生练习与患者讨论异常乳房X光检查结果：多阶段研究。

JMIR Form Res. 2025 Apr 17;9:e65670. doi: 10.2196/65670.

Sexual Harassment and Prevention Training性骚扰与预防培训

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试（USMLE）问题上高精度背后的隐藏挑战：观察性研究。

J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。

Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证：关于加强围手术期患者教育的混合方法研究

J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.

Pharmacy meets AI: Effect of a drug information activity on student perceptions of generative artificial intelligence.药学与人工智能相遇：药物信息活动对学生对生成式人工智能认知的影响。

Curr Pharm Teach Learn. 2025 Jul 7;17(10):102439. doi: 10.1016/j.cptl.2025.102439.

本文引用的文献

Building the AI-Enabled Medical School of the Future.打造未来的人工智能医学院。

JAMA. 2025 May 20;333(19):1665-1666. doi: 10.1001/jama.2025.2789.

A case study on using a large language model to analyze continuous glucose monitoring data.一个关于使用大语言模型分析连续血糖监测数据的案例研究。

Sci Rep. 2025 Jan 7;15(1):1143. doi: 10.1038/s41598-024-84003-0.

Exploring the accuracy of embedded ChatGPT-4 and ChatGPT-4o in generating BI-RADS scores: a pilot study in radiologic clinical support.探索嵌入式ChatGPT-4和ChatGPT-4o生成BI-RADS评分的准确性：放射临床支持的一项初步研究

Clin Imaging. 2025 Jan;117:110335. doi: 10.1016/j.clinimag.2024.110335. Epub 2024 Oct 30.

Performance of Publicly Available Large Language Models on Internal Medicine Board-style Questions.公开可用的大语言模型在内科医师资格考试风格问题上的表现。

PLOS Digit Health. 2024 Sep 17;3(9):e0000604. doi: 10.1371/journal.pdig.0000604. eCollection 2024 Sep.

Racial, ethnic, and sex bias in large language model opioid recommendations for pain management.大语言模型在疼痛管理的阿片类药物推荐中的种族、民族和性别偏见。

Pain. 2025 Mar 1;166(3):511-517. doi: 10.1097/j.pain.0000000000003388. Epub 2024 Sep 6.

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.定制大语言模型对罕见儿科疾病病例报告的诊断准确性

Am J Med Genet A. 2025 Feb;197(2):e63878. doi: 10.1002/ajmg.a.63878. Epub 2024 Sep 13.

Proactive Polypharmacy Management Using Large Language Models: Opportunities to Enhance Geriatric Care.使用大型语言模型进行主动药物治疗管理：改善老年护理的机会。

J Med Syst. 2024 Apr 18;48(1):41. doi: 10.1007/s10916-024-02058-y.

Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care.共情与公平：医疗保健中采用大型语言模型的关键考量。

JMIR Med Educ. 2023 Dec 28;9:e51199. doi: 10.2196/51199.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用：开发和可用性研究。

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.评估 GPT 作为放射学决策辅助工具：GPT-4 与 GPT-3.5 在乳腺成像试点中的比较。

J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能驱动的外科口腔检查模拟器的开发与评估：一项试点研究。

Development and Evaluation of an Artificial Intelligence-Powered Surgical Oral Examination Simulator: A Pilot Study.

作者信息

机构信息

出版信息

OBJECTIVE

PATIENTS AND METHODS

RESULTS

CONCLUSION

目的

患者与方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献