Alexiou Vangelis G, Sumpio Bauer E, Vassiliou Areti, Kakkos Stavros K, Geroulakos George
Department of Surgery - Vascular Surgery Unit, University Hospital of Ioannina, Ioannina, Greece; Alfa Institute of Biomedical Sciences (AIBS), Athens, Greece.
Department of Vascular Surgery, Yale University School of Medicine, New Haven, CT.
Ann Vasc Surg. 2025 Feb;111:260-267. doi: 10.1016/j.avsg.2024.11.014. Epub 2024 Nov 24.
The introduction of artificial intelligence (AI) has led to groundbreaking advancements across many scientific fields. Machine learning algorithms have enabled AI models to learn, adapt, and solve complex problems in previously unimaginable ways. Natural language processing allows these models to comprehend and respond to inquiries in a natural and humanly understandable way. We sought to investigate the application and performance of an AI chatbot in the diagnosis and management of vascular surgery patients.
An experimental study to evaluate the performance of GPT-4 AI model across 57 clinical scenarios derived from a textbook in vascular surgery. Specific prompts were devised to address the AI model and task it to identify symptoms, diagnose conditions, and select appropriate therapeutic approaches. Answers were scored, descriptive statistics were produced, and means were compared across topics. The reasoning and evidence used in the cases in which AI performed poorly were critically reviewed.
The AI model correctly answered over 65% of the 385 questions. Performance variation between and within 13 vascular surgery topics did not show any statistically significant differences. Analysis of the questions where the model failed by more than 50% suggests a gap in the ability to interpret and process multifaceted medical information. Twenty-seven percent of these errors were attributed to potential lack of understanding of complex clinical scenarios. The AI model also quoted incorrect or outdated information in 14% of cases and showed an inability to comprehend context, nuances, and medical classification systems in 11% of the cases.
GPT-4 demonstrated potential to provide clinically relevant answers for most of the tested scenarios. However, its reasoning must still be carefully analyzed for exactitude and clinical validity. While language models show promise as valuable tools for clinicians, it is essential to recognize their role as supportive mechanisms rather than standalone solutions.
人工智能(AI)的引入在许多科学领域带来了突破性进展。机器学习算法使人工智能模型能够以前所未有的方式学习、适应和解决复杂问题。自然语言处理使这些模型能够以自然且人类可理解的方式理解和回答问题。我们试图研究人工智能聊天机器人在血管外科患者诊断和管理中的应用及性能。
一项实验研究,旨在评估GPT-4人工智能模型在从一本血管外科学教科书中提取的57个临床场景中的性能。设计了特定提示来向人工智能模型提问,并要求其识别症状、诊断病情并选择合适的治疗方法。对答案进行评分,生成描述性统计数据,并比较各主题的平均值。对人工智能表现不佳的案例中所使用的推理和证据进行了严格审查。
人工智能模型正确回答了385个问题中的65%以上。13个血管外科主题之间和内部的性能差异未显示出任何统计学上的显著差异。对模型答错率超过50%的问题进行分析表明,在解释和处理多方面医学信息的能力方面存在差距。这些错误中有27%归因于对复杂临床场景可能缺乏理解。人工智能模型在14%的案例中引用了不正确或过时的信息,在11%的案例中表现出无法理解上下文、细微差别和医学分类系统。
GPT-4在大多数测试场景中显示出提供临床相关答案的潜力。然而,仍必须仔细分析其推理的准确性和临床有效性。虽然语言模型有望成为临床医生的宝贵工具,但必须认识到它们作为辅助机制而非独立解决方案的作用。