Flanagan Colleen P, Trang Karen, Nacario Joyce, Schneider Peter A, Gasper Warren J, Conte Michael S, Wick Elizabeth C, Conway Allan M
Division of Vascular and Endovascular Surgery, Department of Surgery, University of California San Francisco, San Francisco, CA; Division of Clinical Informatics and Digital Transformation, Department of Medicine, University of California San Francisco, San Francisco, CA.
Division of Clinical Informatics and Digital Transformation, Department of Medicine, University of California San Francisco, San Francisco, CA; Division of General Surgery, Department of Surgery, University of California San Francisco, San Francisco, CA.
J Vasc Surg. 2025 Apr;81(4):973-982. doi: 10.1016/j.jvs.2024.12.002. Epub 2024 Dec 16.
Participation in the Vascular Quality Initiative (VQI) provides important resources to surgeons, but the ability to do so is often limited by time and data entry personnel. Large language models (LLMs) such as ChatGPT (OpenAI) are examples of generative artificial intelligence products that may help bridge this gap. Trained on large volumes of data, the models are used for natural language processing and text generation. We evaluated the ability of LLMs to accurately populate VQI procedural databases using operative reports.
A single-center, retrospective study was performed using institutional VQI data from 2021 to 2023. The most recent procedures for carotid endarterectomy (CEA), endovascular aneurysm repair (EVAR), and infrainguinal lower extremity bypass (LEB) were analyzed using Versa, a HIPAA (Health Insurance Portability and Accountability Act)-compliant institutional version of ChatGPT. We created an automated function to analyze operative reports and generate a shareable VQI file using two models: gpt-35-turbo and gpt-4. Application of the LLMs was accomplished with a cloud-based programming interface. The outputs of this model were compared with VQI data for accuracy. We defined a metric as "unavailable" to the LLM if it was discussed by surgeons in <20% of operative reports.
A total of 150 operative notes were analyzed, including 50 CEA, 50 EVAR, and 50 LEB. These procedural VQI databases included 25, 179, and 51 metrics, respectively. For all fields, gpt-35-turbo had a median accuracy of 84.0% for CEA (interquartile range [IQR]: 80.0%-88.0%), 92.2% for EVAR (IQR: 87.2%-94.0%), and 84.3% for LEB (IQR: 80.2%-88.1%). A total of 3 of 25, 6 of 179, and 7 of 51 VQI variables were unavailable in the operative reports, respectively. Excluding metric information routinely unavailable in operative reports, the median accuracy rate was 95.5% for each CEA procedure (IQR: 90.9%-100.0%), 94.8% for EVAR (IQR: 92.2%-98.5%), and 93.2% for LEB (IQR: 90.2%-96.4%). Across procedures, gpt-4 did not meaningfully improve performance compared with gpt-35 (P = .97, .85, and .95 for CEA, EVAR, and LEB overall performance, respectively). The cost for 150 operative reports analyzed with gpt-35-turbo and gpt-4 was $0.12 and $3.39, respectively.
LLMs can accurately populate VQI procedural databases with both structured and unstructured data, while incurring only minor processing costs. Increased workflow efficiency may improve center ability to successfully participate in the VQI. Further work examining other VQI databases and methods to increase accuracy is needed.
参与血管质量倡议(VQI)能为外科医生提供重要资源,但参与能力往往受到时间和数据录入人员的限制。诸如ChatGPT(OpenAI)之类的大语言模型(LLM)是生成式人工智能产品的示例,可能有助于弥合这一差距。这些模型基于大量数据进行训练,用于自然语言处理和文本生成。我们评估了大语言模型使用手术报告准确填充VQI程序数据库的能力。
使用2021年至2023年机构VQI数据进行了一项单中心回顾性研究。使用Versa(符合《健康保险流通与责任法案》(HIPAA)的机构版ChatGPT)分析了颈动脉内膜切除术(CEA)、血管内动脉瘤修复术(EVAR)和腹股沟下下肢旁路移植术(LEB)的最新手术。我们创建了一个自动功能,使用gpt-35-turbo和gpt-4两种模型分析手术报告并生成可共享的VQI文件。大语言模型的应用通过基于云的编程接口完成。将该模型的输出与VQI数据进行准确性比较。如果外科医生在<20%的手术报告中讨论了某个指标,我们将其定义为大语言模型“不可用”的指标。
共分析了150份手术记录,包括50例CEA、50例EVAR和50例LEB。这些程序的VQI数据库分别包含25个、179个和51个指标。对于所有字段,gpt-35-turbo在CEA中的中位准确率为84.0%(四分位间距[IQR]:80.0%-88.0%),在EVAR中为92.2%(IQR:87.2%-94.0%),在LEB中为84.3%(IQR:80.2%-88.1%)。手术报告中分别有25个VQI变量中的3个、179个中的6个和51个中的7个不可用。排除手术报告中常规不可用的指标信息后,每个CEA手术的中位准确率为95.5%(IQR:90.9%-100.0%),EVAR为94.8%(IQR:92.2%-98.5%),LEB为93.2%(IQR:90.2%-