人工智能在血管外科患者诊断与管理中的应用：一项使用GPT-4模型的实验研究

Artificial Intelligence in Diagnosing and Managing Vascular Surgery Patients: An Experimental Study Using the GPT-4 Model.

作者信息

Alexiou Vangelis G, Sumpio Bauer E, Vassiliou Areti, Kakkos Stavros K, Geroulakos George

机构信息

Department of Surgery - Vascular Surgery Unit, University Hospital of Ioannina, Ioannina, Greece; Alfa Institute of Biomedical Sciences (AIBS), Athens, Greece.

Department of Vascular Surgery, Yale University School of Medicine, New Haven, CT.

出版信息

Ann Vasc Surg. 2025 Feb;111:260-267. doi: 10.1016/j.avsg.2024.11.014. Epub 2024 Nov 24.

DOI:10.1016/j.avsg.2024.11.014

PMID:39586530

Abstract

BACKGROUND

The introduction of artificial intelligence (AI) has led to groundbreaking advancements across many scientific fields. Machine learning algorithms have enabled AI models to learn, adapt, and solve complex problems in previously unimaginable ways. Natural language processing allows these models to comprehend and respond to inquiries in a natural and humanly understandable way. We sought to investigate the application and performance of an AI chatbot in the diagnosis and management of vascular surgery patients.

METHODS

An experimental study to evaluate the performance of GPT-4 AI model across 57 clinical scenarios derived from a textbook in vascular surgery. Specific prompts were devised to address the AI model and task it to identify symptoms, diagnose conditions, and select appropriate therapeutic approaches. Answers were scored, descriptive statistics were produced, and means were compared across topics. The reasoning and evidence used in the cases in which AI performed poorly were critically reviewed.

RESULTS

The AI model correctly answered over 65% of the 385 questions. Performance variation between and within 13 vascular surgery topics did not show any statistically significant differences. Analysis of the questions where the model failed by more than 50% suggests a gap in the ability to interpret and process multifaceted medical information. Twenty-seven percent of these errors were attributed to potential lack of understanding of complex clinical scenarios. The AI model also quoted incorrect or outdated information in 14% of cases and showed an inability to comprehend context, nuances, and medical classification systems in 11% of the cases.

CONCLUSIONS

GPT-4 demonstrated potential to provide clinically relevant answers for most of the tested scenarios. However, its reasoning must still be carefully analyzed for exactitude and clinical validity. While language models show promise as valuable tools for clinicians, it is essential to recognize their role as supportive mechanisms rather than standalone solutions.

摘要

背景

人工智能（AI）的引入在许多科学领域带来了突破性进展。机器学习算法使人工智能模型能够以前所未有的方式学习、适应和解决复杂问题。自然语言处理使这些模型能够以自然且人类可理解的方式理解和回答问题。我们试图研究人工智能聊天机器人在血管外科患者诊断和管理中的应用及性能。

方法

一项实验研究，旨在评估GPT-4人工智能模型在从一本血管外科学教科书中提取的57个临床场景中的性能。设计了特定提示来向人工智能模型提问，并要求其识别症状、诊断病情并选择合适的治疗方法。对答案进行评分，生成描述性统计数据，并比较各主题的平均值。对人工智能表现不佳的案例中所使用的推理和证据进行了严格审查。

结果

人工智能模型正确回答了385个问题中的65%以上。13个血管外科主题之间和内部的性能差异未显示出任何统计学上的显著差异。对模型答错率超过50%的问题进行分析表明，在解释和处理多方面医学信息的能力方面存在差距。这些错误中有27%归因于对复杂临床场景可能缺乏理解。人工智能模型在14%的案例中引用了不正确或过时的信息，在11%的案例中表现出无法理解上下文、细微差别和医学分类系统。