Suppr超能文献

Evaluating Large Language Model's accuracy in current procedural terminology coding given operative note templates across various plastic surgery sub-specialties.

作者信息

Carrarini Mia J, Liu Hilary Y, Perez Catherine K, Egro Francesco M

机构信息

Department of Plastic Surgery, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USA.

Department of Plastic Surgery, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USA.

出版信息

J Plast Reconstr Aesthet Surg. 2025 Jul;106:50-52. doi: 10.1016/j.bjps.2025.04.025. Epub 2025 Apr 23.

Abstract

BACKGROUND

Manual CPT coding from operative notes is a time-intensive process that adds to the administrative burden in healthcare. Large Language Models (LLMs) offer a promising solution, but their accuracy in assigning CPT codes based on full operative note templates remains largely untested. Thus, this study evaluates the ability of three LLMs - GPT-4, Gemini, and Copilot - to generate accurate CPT codes from operative note templates across diverse plastic surgery procedures.

METHODS

Twenty-six deidentified operative note templates from six plastic surgery subspecialities were entered into each LLM using a standardized prompt requesting appropriate CPT codes. Model outputs were compared to surgeon-verified codes and categorized as correct (all codes accurate), partially correct (some correct codes with errors), or incorrect (no correct codes). Accuracy was analyzed overall and by subspeciality using Extended Fisher's Exact Tests (significance set at p<0.05).

RESULTS

There was a significant difference in overall coding accuracy between the models (p = 0.02176). Gemini and Copilot had the highest accuracy rates (19.2% each), though Copilot produced more partially correct outputs (53.8%). GPT-4 had the lowest accuracy (7.7%). Subspeciality analysis showed Gemini performed best in aesthetic surgery (60%), while Copilot was most accurate in general reconstruction (42.9%). None of the models correctly coded breast reconstruction or craniofacial trauma procedures. Frequent errors included misidentification of procedural details and inappropriate bundling of CPT codes.

CONCLUSION

LLMs show potential for automating CPT coding but currently lack the contextual understanding required for reliable accuracy. Continued human oversight and model refinement are essential for future success of LLM CPT coding.

摘要

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验