Kenig Nitzan, Monton Echeverria Javier, Muntaner Vives Aina
Department of Plastic Surgery, Quironsalud Palmaplanas Hospital, 07010 Palma, Spain.
Department of Plastic Surgery, Albacete University Hospital, 02006 Albacete, Spain.
J Clin Med. 2024 Nov 24;13(23):7108. doi: 10.3390/jcm13237108.
: Artificial Intelligence (AI) holds promise for transforming healthcare, with AI models gaining increasing clinical use in surgery. However, new AI models are developed without established standards for their validation and use. Before AI can be widely adopted, it is crucial to ensure these models are both accurate and safe for patients. Without proper validation, there is a risk of integrating AI models into practice without sufficient evidence of their safety and accuracy, potentially leading to suboptimal patient outcomes. In this work, we review the current use and validation methods of AI models in clinical surgical settings and propose a novel classification system. : A systematic review was conducted in PubMed and Cochrane using the keywords "validation", "artificial intelligence", and "surgery", following PRISMA guidelines. : The search yielded a total of 7627 articles, of which 102 were included for data extraction, encompassing 2,837,211 patients. A validation classification system named Surgical Validation Score (SURVAS) was developed. The primary applications of models were risk assessment and decision-making in the preoperative setting. Validation methods were ranked as high evidence in only 45% of studies, and only 14% of the studies provided publicly available datasets. : AI has significant applications in surgery, but validation quality remains suboptimal, and public data availability is limited. Current AI applications are mainly focused on preoperative risk assessment and are suggested to improve decision-making. Classification systems such as SURVAS can help clinicians confirm the degree of validity of AI models before their application in practice.
人工智能(AI)有望变革医疗保健领域,人工智能模型在外科手术中的临床应用日益增多。然而,新的人工智能模型在开发时并未建立起验证和使用的既定标准。在人工智能能够被广泛采用之前,确保这些模型对患者既准确又安全至关重要。如果没有适当的验证,就存在在没有充分证据证明其安全性和准确性的情况下将人工智能模型整合到实践中的风险,这可能会导致患者预后不佳。在这项工作中,我们回顾了人工智能模型在临床手术环境中的当前使用情况和验证方法,并提出了一种新颖的分类系统。:按照PRISMA指南,在PubMed和Cochrane中使用关键词“验证”、“人工智能”和“手术”进行了系统综述。:搜索共得到7627篇文章,其中102篇被纳入数据提取,涵盖2837211名患者。开发了一种名为手术验证评分(SURVAS)的验证分类系统。模型的主要应用是术前环境中的风险评估和决策。在仅45%的研究中,验证方法被列为高证据等级,只有14%的研究提供了公开可用的数据集。:人工智能在手术中有重要应用,但验证质量仍不理想,公开数据的可用性有限。当前的人工智能应用主要集中在术前风险评估方面,并建议用于改善决策。像SURVAS这样的分类系统可以帮助临床医生在将人工智能模型应用于实践之前确认其有效性程度。