评估大语言模型在颅面外科手术CPT编码中的有效性：一项比较分析。

Evaluating the Efficacy of Large Language Models in CPT Coding for Craniofacial Surgery: A Comparative Analysis.

作者信息

Isch Emily L, Sarikonda Advith, Sambangi Abhijeet, Carreras Angeleah, Sircar Adrija, Self D Mitchell, Habarth-Morales Theodore E, Caterson E J, Aycart Mario

机构信息

Department of General Surgery, Thomas Jefferson University.

Sidney Kimmel Medical College at Thomas Jefferson University.

出版信息

J Craniofac Surg. 2025 May 1;36(3):831-835. doi: 10.1097/SCS.0000000000010575. Epub 2024 Sep 2.

DOI:10.1097/SCS.0000000000010575

PMID:39221924

Abstract

BACKGROUND

The advent of Large Language Models (LLMs) like ChatGPT has introduced significant advancements in various surgical disciplines. These developments have led to an increased interest in the utilization of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.

METHODS

This observational study evaluated the effectiveness of 5 publicly available large language models-Perplexity.AI, Bard, BingAI, ChatGPT 3.5, and ChatGPT 4.0-in accurately identifying CPT codes for craniofacial procedures. A consistent query format was employed to test each model, ensuring the inclusion of detailed procedure components where necessary. The responses were classified as correct, partially correct, or incorrect based on their alignment with established CPT coding for the specified procedures.

RESULTS

The results indicate that while there is no overall significant association between the type of AI model and the correctness of CPT code identification, there are notable differences in performance for simple and complex CPT codes among the models. Specifically, ChatGPT 4.0 showed higher accuracy for complex codes, whereas Perplexity.AI and Bard were more consistent with simple codes.

DISCUSSION

The use of AI chatbots for CPT coding in craniofacial surgery presents a promising avenue for reducing the administrative burden and associated costs of manual coding. Despite the lower accuracy rates compared with specialized, trained algorithms, the accessibility and minimal training requirements of the AI chatbots make them attractive alternatives. The study also suggests that priming AI models with operative notes may enhance their accuracy, offering a resource-efficient strategy for improving CPT coding in clinical practice.

CONCLUSIONS

This study highlights the feasibility and potential benefits of integrating LLMs into the CPT coding process for craniofacial surgery. The findings advocate for further refinement and training of AI models to improve their accuracy and practicality, suggesting a future where AI-assisted coding could become a standard component of surgical workflows, aligning with the ongoing digital transformation in health care.

摘要

背景

像ChatGPT这样的大语言模型（LLMs）的出现给各个外科学科带来了重大进展。这些发展使得人们对在外科手术中利用大语言模型进行当前操作术语（CPT）编码的兴趣增加。由于CPT编码是一个复杂且耗时的过程，专业编码人员的短缺往往会加剧这一问题，因此迫切需要创新解决方案来提高编码效率和准确性。

方法

这项观察性研究评估了5个公开可用的大语言模型——Perplexity.AI、Bard、BingAI、ChatGPT 3.5和ChatGPT 4.0——在准确识别颅面手术CPT编码方面的有效性。采用一致的查询格式来测试每个模型，确保在必要时纳入详细的手术组成部分。根据回复与指定手术既定CPT编码的一致性，将回复分类为正确、部分正确或不正确。

结果

结果表明，虽然人工智能模型的类型与CPT编码识别的正确性之间没有总体显著关联，但各模型在简单和复杂CPT编码的性能上存在显著差异。具体而言，ChatGPT 4.0在复杂编码方面显示出更高的准确性，而Perplexity.AI和Bard在简单编码方面更一致。

讨论

在颅面外科手术中使用人工智能聊天机器人进行CPT编码为减轻手工编码的管理负担和相关成本提供了一条有前景的途径。尽管与经过专门训练的算法相比准确率较低，但人工智能聊天机器人的可及性和最低培训要求使其成为有吸引力的替代方案。该研究还表明，用手术记录引导人工智能模型可能会提高其准确性，为在临床实践中改进CPT编码提供一种资源高效的策略。