Evaluating Large Language Model's accuracy in current procedural terminology coding given operative note templates across various plastic surgery sub-specialties.

Suppr

超能文献

作者信息

Carrarini Mia J, Liu Hilary Y, Perez Catherine K, Egro Francesco M

机构信息

Department of Plastic Surgery, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USA.

出版信息

J Plast Reconstr Aesthet Surg. 2025 Jul;106:50-52. doi: 10.1016/j.bjps.2025.04.025. Epub 2025 Apr 23.

DOI:10.1016/j.bjps.2025.04.025

PMID:40367652

Abstract

BACKGROUND

Manual CPT coding from operative notes is a time-intensive process that adds to the administrative burden in healthcare. Large Language Models (LLMs) offer a promising solution, but their accuracy in assigning CPT codes based on full operative note templates remains largely untested. Thus, this study evaluates the ability of three LLMs - GPT-4, Gemini, and Copilot - to generate accurate CPT codes from operative note templates across diverse plastic surgery procedures.

METHODS

Twenty-six deidentified operative note templates from six plastic surgery subspecialities were entered into each LLM using a standardized prompt requesting appropriate CPT codes. Model outputs were compared to surgeon-verified codes and categorized as correct (all codes accurate), partially correct (some correct codes with errors), or incorrect (no correct codes). Accuracy was analyzed overall and by subspeciality using Extended Fisher's Exact Tests (significance set at p<0.05).

RESULTS

There was a significant difference in overall coding accuracy between the models (p = 0.02176). Gemini and Copilot had the highest accuracy rates (19.2% each), though Copilot produced more partially correct outputs (53.8%). GPT-4 had the lowest accuracy (7.7%). Subspeciality analysis showed Gemini performed best in aesthetic surgery (60%), while Copilot was most accurate in general reconstruction (42.9%). None of the models correctly coded breast reconstruction or craniofacial trauma procedures. Frequent errors included misidentification of procedural details and inappropriate bundling of CPT codes.

CONCLUSION

LLMs show potential for automating CPT coding but currently lack the contextual understanding required for reliable accuracy. Continued human oversight and model refinement are essential for future success of LLM CPT coding.

摘要

相似文献

Evaluating Large Language Model's accuracy in current procedural terminology coding given operative note templates across various plastic surgery sub-specialties.

J Plast Reconstr Aesthet Surg. 2025 Jul;106:50-52. doi: 10.1016/j.bjps.2025.04.025. Epub 2025 Apr 23.

Evaluating the Efficacy of Large Language Models in CPT Coding for Craniofacial Surgery: A Comparative Analysis.评估大语言模型在颅面外科手术CPT编码中的有效性：一项比较分析。

J Craniofac Surg. 2025 May 1;36(3):831-835. doi: 10.1097/SCS.0000000000010575. Epub 2024 Sep 2.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物：网状Meta分析

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

Surgical interventions for treating extracapsular hip fractures in older adults: a network meta-analysis.老年人髋关节囊外骨折的手术干预：一项网络荟萃分析。

Cochrane Database Syst Rev. 2022 Feb 10;2(2):CD013405. doi: 10.1002/14651858.CD013405.pub2.

The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4（GPT-4）分析三种不同语言医学笔记的潜力：一项回顾性模型评估研究。

Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Does Augmenting Irradiated Autografts With Free Vascularized Fibula Graft in Patients With Bone Loss From a Malignant Tumor Achieve Union, Function, and Complication Rate Comparably to Patients Without Bone Loss and Augmentation When Reconstructing Intercalary Resections in the Lower Extremity?对于因恶性肿瘤导致骨缺损的患者，在重建下肢节段性切除时，采用带血管游离腓骨移植来增强照射后的自体骨移植，其骨愈合、功能及并发症发生率与无骨缺损且未进行增强的患者相比是否相当？

Clin Orthop Relat Res. 2025 Jun 26. doi: 10.1097/CORR.0000000000003599.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用：Claude、GPT和Gemini的比较研究

JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验