手术编码中的人工智能：评估大型语言模型对手外科手术当前操作术语准确性的表现

Artificial Intelligence in Surgical Coding: Evaluating Large Language Models for Current Procedural Terminology Accuracy in Hand Surgery.

作者信息

Isch Emily L, Lee Jamie, Self D Mitchell, Sambangi Abhijeet, Habarth-Morales Theodore E, Vaile John, Caterson E J

机构信息

Department of General Surgery, Thomas Jefferson University, Philadelphia, PA.

Drexel University College of Medicine, Philadelphia, PA.

出版信息

J Hand Surg Glob Online. 2025 Jan 9;7(2):181-185. doi: 10.1016/j.jhsg.2024.11.013. eCollection 2025 Mar.

DOI:10.1016/j.jhsg.2024.11.013

PMID:40182863

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11963066/

Abstract

PURPOSE

The advent of large language models (LLMs) like ChatGPT has introduced notable advancements in various surgical disciplines. These developments have led to an increased interest in the use of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.

METHODS

This observational study evaluated the effectiveness of five publicly available large language models-Perplexity.AI, Bard, BingAI, ChatGPT 3.5, and ChatGPT 4.0-in accurately identifying CPT codes for hand surgery procedures. A consistent query format was employed to test each model, ensuring the inclusion of detailed procedure components where necessary. The responses were classified as correct, partially correct, or incorrect based on their alignment with established CPT coding for the specified procedures.

RESULTS

In the evaluation of artificial intelligence (AI) model performance on simple procedures, Perplexity.AI achieved the highest number of correct outcomes (15), followed by Bard and Bing AI (14 each). ChatGPT 4 and ChatGPT 3.5 yielded 8 and 7 correct outcomes, respectively. For complex procedures, Perplexity.AI and Bard each had three correct outcomes, whereas ChatGPT models had none. Bing AI had the highest number of partially correct outcomes (5). There were significant associations between AI models and performance outcomes for both simple and complex procedures.

CONCLUSIONS

This study highlights the feasibility and potential benefits of integrating LLMs into the CPT coding process for hand surgery. The findings advocate for further refinement and training of AI models to improve their accuracy and practicality, suggesting a future where AI-assisted coding could become a standard component of surgical workflows, aligning with the ongoing digital transformation in health care.

TYPE OF STUDY/LEVEL OF EVIDENCE: Observational, IIIb.

摘要

目的

像ChatGPT这样的大语言模型（LLMs）的出现给各个外科学科带来了显著进展。这些进展使得人们对在外科手术中使用大语言模型进行当前操作术语（CPT）编码的兴趣增加。由于CPT编码是一个复杂且耗时的过程，专业编码人员的短缺往往会加剧这一问题，因此迫切需要创新解决方案来提高编码效率和准确性。

方法

这项观察性研究评估了五个公开可用的大语言模型——Perplexity.AI、Bard、BingAI、ChatGPT 3.5和ChatGPT 4.0——对手部手术程序准确识别CPT编码的有效性。采用一致的查询格式来测试每个模型，确保在必要时包含详细的程序组件。根据与指定程序既定CPT编码的一致性，将回复分类为正确、部分正确或不正确。

结果

在对简单程序的人工智能（AI）模型性能评估中，Perplexity.AI获得的正确结果数量最多（15个），其次是Bard和Bing AI（各14个）。ChatGPT 4和ChatGPT 3.5分别产生了8个和7个正确结果。对于复杂程序，Perplexity.AI和Bard各有3个正确结果，而ChatGPT模型没有。Bing AI的部分正确结果数量最多（5个）。简单和复杂程序的AI模型与性能结果之间存在显著关联。

结论

本研究强调了将大语言模型整合到手部手术CPT编码过程中的可行性和潜在益处。研究结果主张进一步完善和训练AI模型以提高其准确性和实用性，这表明未来AI辅助编码可能成为手术工作流程的标准组成部分，与医疗保健领域正在进行的数字转型相一致。

研究类型/证据水平：观察性研究，IIIb级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e06c/11963066/795b9ec8d894/gr1.jpg

相似文献

Artificial Intelligence in Surgical Coding: Evaluating Large Language Models for Current Procedural Terminology Accuracy in Hand Surgery.手术编码中的人工智能：评估大型语言模型对手外科手术当前操作术语准确性的表现

J Hand Surg Glob Online. 2025 Jan 9;7(2):181-185. doi: 10.1016/j.jhsg.2024.11.013. eCollection 2025 Mar.

Evaluating the Efficacy of Large Language Models in CPT Coding for Craniofacial Surgery: A Comparative Analysis.评估大语言模型在颅面外科手术CPT编码中的有效性：一项比较分析。

J Craniofac Surg. 2025 May 1;36(3):831-835. doi: 10.1097/SCS.0000000000010575. Epub 2024 Sep 2.

Assessing AI Accuracy in Generating CPT Codes From Surgical Operative Notes.评估人工智能从手术记录中生成CPT编码的准确性。

J Craniofac Surg. 2025 Mar 24. doi: 10.1097/SCS.0000000000011258.

Bridging the Coding Gap: Assessing Large Language Models for Accurate Modifier Assignment in Craniofacial Operative Notes.弥合编码差距：评估大型语言模型在颅面手术记录中准确分配修饰词的能力

J Craniofac Surg. 2025 Apr 11. doi: 10.1097/SCS.0000000000011390.

Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现：ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。

Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.

How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.人工智能如何回答常见肺癌问题：ChatGPT 与 Google Bard 对比。

Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.

Can Publicly Available Artificial Intelligence Successfully Identify Current Procedural Terminology Codes for Common Procedures in Neurosurgery?公开可用的人工智能能否成功识别神经外科常见手术的当前操作术语代码？

World Neurosurg. 2024 Mar;183:e860-e870. doi: 10.1016/j.wneu.2024.01.043. Epub 2024 Jan 12.

Evaluating Large Language Models for Automated CPT Code Prediction in Endovascular Neurosurgery.评估大语言模型用于血管内神经外科手术中自动预测CPT代码

J Med Syst. 2025 Jan 24;49(1):15. doi: 10.1007/s10916-025-02149-4.

PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation.PICOT问题与检索策略制定：一种使用人工智能自动化的新方法。

J Nurs Scholarsh. 2025 Jan;57(1):5-16. doi: 10.1111/jnu.13036. Epub 2024 Nov 24.

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比：横断面试点研究

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

引用本文的文献

Comment on "Artificial Intelligence in Surgical Coding: Evaluating Large Language Models for Current Procedural Terminology Accuracy in Hand Surgery".关于《外科编码中的人工智能：评估大型语言模型对手外科当前手术操作术语准确性的影响》的评论

J Hand Surg Glob Online. 2025 Jun 6;7(4):100745. doi: 10.1016/j.jhsg.2025.100745. eCollection 2025 Jul.

本文引用的文献

World Neurosurg. 2024 Mar;183:e860-e870. doi: 10.1016/j.wneu.2024.01.043. Epub 2024 Jan 12.

Artificially Intelligent Billing in Spine Surgery: An Analysis of a Large Language Model.脊柱外科中的人工智能计费：大语言模型分析

Global Spine J. 2025 Mar;15(2):1113-1120. doi: 10.1177/21925682231224753. Epub 2023 Dec 26.

Evaluating the Current Ability of ChatGPT to Assist in Professional Otolaryngology Education.评估ChatGPT目前在专业耳鼻喉科教育中的辅助能力。

OTO Open. 2023 Nov 22;7(4):e94. doi: 10.1002/oto2.94. eCollection 2023 Oct-Dec.

Performance of ChatGPT in Otolaryngology knowledge assessment.ChatGPT在耳鼻喉科知识评估中的表现。

Am J Otolaryngol. 2024 Jan-Feb;45(1):104082. doi: 10.1016/j.amjoto.2023.104082. Epub 2023 Oct 14.

Large Language Model-Based Neurosurgical Evaluation Matrix: A Novel Scoring Criteria to Assess the Efficacy of ChatGPT as an Educational Tool for Neurosurgery Board Preparation.基于大语言模型的神经外科评估矩阵：一种评估 ChatGPT 作为神经外科委员会准备教育工具效果的新评分标准。

World Neurosurg. 2023 Dec;180:e765-e773. doi: 10.1016/j.wneu.2023.10.043. Epub 2023 Oct 14.

A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons.一项基于ChatGPT与循证神经外科医生比较的描述性研究。

iScience. 2023 Aug 9;26(9):107590. doi: 10.1016/j.isci.2023.107590. eCollection 2023 Sep 15.

ChatGPT in Plastic and Reconstructive Surgery.整形与重建外科中的ChatGPT

Indian J Plast Surg. 2023 Aug 2;56(4):320-325. doi: 10.1055/s-0043-1771514. eCollection 2023 Aug.

Exploring the Intersection of Artificial Intelligence and Neurosurgery: Let us be Cautious With ChatGPT.探索人工智能与神经外科学的交叉点：让我们对 ChatGPT 保持谨慎。

Neurosurgery. 2023 Dec 1;93(6):1366-1373. doi: 10.1227/neu.0000000000002598. Epub 2023 Jul 7.

The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study.开放式人工智能平台在现代神经外科学教育中的作用：初步研究。

Neurosurg Rev. 2023 Apr 14;46(1):86. doi: 10.1007/s10143-023-01998-2.

Expanding Cosmetic Plastic Surgery Research With ChatGPT.利用 ChatGPT 拓展美容整形外科学研究。

Aesthet Surg J. 2023 Jul 15;43(8):930-937. doi: 10.1093/asj/sjad069.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

手术编码中的人工智能：评估大型语言模型对手外科手术当前操作术语准确性的表现

Artificial Intelligence in Surgical Coding: Evaluating Large Language Models for Current Procedural Terminology Accuracy in Hand Surgery.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献