Suppr超能文献

手术编码中的人工智能:评估大型语言模型对手外科手术当前操作术语准确性的表现

Artificial Intelligence in Surgical Coding: Evaluating Large Language Models for Current Procedural Terminology Accuracy in Hand Surgery.

作者信息

Isch Emily L, Lee Jamie, Self D Mitchell, Sambangi Abhijeet, Habarth-Morales Theodore E, Vaile John, Caterson E J

机构信息

Department of General Surgery, Thomas Jefferson University, Philadelphia, PA.

Drexel University College of Medicine, Philadelphia, PA.

出版信息

J Hand Surg Glob Online. 2025 Jan 9;7(2):181-185. doi: 10.1016/j.jhsg.2024.11.013. eCollection 2025 Mar.

Abstract

PURPOSE

The advent of large language models (LLMs) like ChatGPT has introduced notable advancements in various surgical disciplines. These developments have led to an increased interest in the use of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.

METHODS

This observational study evaluated the effectiveness of five publicly available large language models-Perplexity.AI, Bard, BingAI, ChatGPT 3.5, and ChatGPT 4.0-in accurately identifying CPT codes for hand surgery procedures. A consistent query format was employed to test each model, ensuring the inclusion of detailed procedure components where necessary. The responses were classified as correct, partially correct, or incorrect based on their alignment with established CPT coding for the specified procedures.

RESULTS

In the evaluation of artificial intelligence (AI) model performance on simple procedures, Perplexity.AI achieved the highest number of correct outcomes (15), followed by Bard and Bing AI (14 each). ChatGPT 4 and ChatGPT 3.5 yielded 8 and 7 correct outcomes, respectively. For complex procedures, Perplexity.AI and Bard each had three correct outcomes, whereas ChatGPT models had none. Bing AI had the highest number of partially correct outcomes (5). There were significant associations between AI models and performance outcomes for both simple and complex procedures.

CONCLUSIONS

This study highlights the feasibility and potential benefits of integrating LLMs into the CPT coding process for hand surgery. The findings advocate for further refinement and training of AI models to improve their accuracy and practicality, suggesting a future where AI-assisted coding could become a standard component of surgical workflows, aligning with the ongoing digital transformation in health care.

TYPE OF STUDY/LEVEL OF EVIDENCE: Observational, IIIb.

摘要

目的

像ChatGPT这样的大语言模型(LLMs)的出现给各个外科学科带来了显著进展。这些进展使得人们对在外科手术中使用大语言模型进行当前操作术语(CPT)编码的兴趣增加。由于CPT编码是一个复杂且耗时的过程,专业编码人员的短缺往往会加剧这一问题,因此迫切需要创新解决方案来提高编码效率和准确性。

方法

这项观察性研究评估了五个公开可用的大语言模型——Perplexity.AI、Bard、BingAI、ChatGPT 3.5和ChatGPT 4.0——对手部手术程序准确识别CPT编码的有效性。采用一致的查询格式来测试每个模型,确保在必要时包含详细的程序组件。根据与指定程序既定CPT编码的一致性,将回复分类为正确、部分正确或不正确。

结果

在对简单程序的人工智能(AI)模型性能评估中,Perplexity.AI获得的正确结果数量最多(15个),其次是Bard和Bing AI(各14个)。ChatGPT 4和ChatGPT 3.5分别产生了8个和7个正确结果。对于复杂程序,Perplexity.AI和Bard各有3个正确结果,而ChatGPT模型没有。Bing AI的部分正确结果数量最多(5个)。简单和复杂程序的AI模型与性能结果之间存在显著关联。

结论

本研究强调了将大语言模型整合到手部手术CPT编码过程中的可行性和潜在益处。研究结果主张进一步完善和训练AI模型以提高其准确性和实用性,这表明未来AI辅助编码可能成为手术工作流程的标准组成部分,与医疗保健领域正在进行的数字转型相一致。

研究类型/证据水平:观察性研究,IIIb级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e06c/11963066/795b9ec8d894/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验