Suppr超能文献

机器人前列腺切除术缝合中误差和技能的深度学习预测

Deep learning prediction of error and skill in robotic prostatectomy suturing.

作者信息

Sirajudeen N, Boal M, Anastasiou D, Xu J, Stoyanov D, Kelly J, Collins J W, Sridhar A, Mazomenos E, Francis N K

机构信息

Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK.

The Griffin Institute, Northwick Park and St Marks Hospital, London, UK.

出版信息

Surg Endosc. 2024 Dec;38(12):7663-7671. doi: 10.1007/s00464-024-11341-5. Epub 2024 Oct 21.

Abstract

BACKGROUND

Manual objective assessment of skill and errors in minimally invasive surgery have been validated with correlation to surgical expertise and patient outcomes. However, assessment and error annotation can be subjective and are time-consuming processes, often precluding their use. Recent years have seen the development of artificial intelligence models to work towards automating the process to allow reduction of errors and truly objective assessment. This study aimed to validate surgical skill rating and error annotations in suturing gestures to inform the development and evaluation of AI models.

METHODS

SAR-RARP50 open data set was blindly, independently annotated at the gesture level in Robotic-Assisted Radical Prostatectomy (RARP) suturing. Manual objective assessment tools and error annotation methodology, Objective Clinical Human Reliability Analysis (OCHRA), were used as ground truth to train and test vision-based deep learning methods to estimate skill and errors. Analysis included descriptive statistics plus tool validity and reliability.

RESULTS

Fifty-four RARP videos (266 min) were analysed. Strong/excellent inter-rater reliability (range r = 0.70-0.89, p < 0.001) and very strong correlation (r = 0.92, p < 0.001) between objective assessment tools was demonstrated. Skill estimation of OSATS and M-GEARS had a Spearman's Correlation Coefficient 0.37 and 0.36, respectively, with normalised mean absolute error representing a prediction error of 17.92% (inverted "accuracy" 82.08%) and 20.6% (inverted "accuracy" 79.4%) respectively. The best performing models in error prediction achieved mean absolute precision of 37.14%, area under the curve 65.10% and Macro-F1 58.97%.

CONCLUSIONS

This is the first study to employ detailed error detection methodology and deep learning models within real robotic surgical video. This benchmark evaluation of AI models sets a foundation and promising approach for future advancements in automated technical skill assessment.

摘要

背景

微创外科手术技能和失误的人工客观评估已得到验证,与手术专业水平和患者预后相关。然而,评估和错误标注可能具有主观性,且过程耗时,常常妨碍其应用。近年来,人工智能模型得到了发展,致力于使这一过程自动化,以减少错误并实现真正客观的评估。本研究旨在验证缝合手势中的手术技能评分和错误标注,以为人工智能模型的开发和评估提供参考。

方法

在机器人辅助根治性前列腺切除术(RARP)缝合过程中,对SAR-RARP50开放数据集在手势层面进行了盲法、独立标注。使用人工客观评估工具和错误标注方法,即客观临床人员可靠性分析(OCHRA),作为训练和测试基于视觉的深度学习方法以估计技能和错误的基准。分析包括描述性统计以及工具的有效性和可靠性。

结果

分析了54个RARP视频(266分钟)。客观评估工具之间显示出很强/极好的评分者间信度(范围r = 0.70 - 0.89,p < 0.001)和非常强的相关性(r = 0.92,p < 0.001)。OSATS和M-GEARS的技能估计与标准化平均绝对误差的斯皮尔曼相关系数分别为0.37和0.36,标准化平均绝对误差分别代表17.92%的预测误差(反向“准确率”82.08%)和20.6%的预测误差(反向“准确率”79.4%)。在错误预测中表现最佳的模型的平均绝对精度为37.14%,曲线下面积为65.10%,宏F1值为58.97%。

结论

这是第一项在真实机器人手术视频中采用详细错误检测方法和深度学习模型的研究。对人工智能模型的这一基准评估为未来自动化技术技能评估的进步奠定了基础,并提供了有前景的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2412/11614916/60109963e78c/464_2024_11341_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验