Reinke Annika
Abteilung Intelligente Medizinische Systeme und Helmholtz Imaging, Deutsches Krebsforschungszentrum (DKFZ) Heidelberg, Im Neuenheimer Feld 223, 69120, Heidelberg, Deutschland.
Chirurgie (Heidelb). 2025 Jul 11. doi: 10.1007/s00104-025-02348-2.
Artificial intelligence (AI) is increasingly being used in surgery; however, the validation of such systems is often methodologically insufficient.
Which validation issues arise in surgical AI and what requirements can be derived for clinically meaningful validation strategies?
Metric-related pitfalls reported in the literature were analyzed, combined with insights from the interdisciplinary consensus process "metrics reloaded" and its ongoing extension to surgical applications.
Recurring weaknesses are observed at the levels of data, metrics and reporting. The lack of consideration of temporal structures and aggregation in video data is particularly critical.
A structured, clinically grounded validation is essential for the safe use of surgical AI. The metrics reloaded procedure is currently being adapted to address surgery-specific requirements.
人工智能(AI)在手术中的应用日益广泛;然而,此类系统的验证在方法上往往并不充分。
手术人工智能会出现哪些验证问题,对于具有临床意义的验证策略可得出哪些要求?
分析文献中报道的与指标相关的陷阱,并结合跨学科共识过程“指标重新加载”及其对外科应用的持续扩展所获得的见解。
在数据、指标和报告层面反复出现弱点。视频数据中时间结构和聚合未得到考虑的问题尤为关键。
结构化的、基于临床的验证对于手术人工智能的安全使用至关重要。目前正在对指标重新加载程序进行调整,以满足手术特定要求。