Williams Simon C, Duvaux Dorothée, Das Adrito, Sinha Siddharth, Layard Horsfall Hugo, Funnell Jonathan P, Hanrahan John G, Khan Danyal Z, Muirhead William, Kitchen Neil, Vasconcelos Francisco, Bano Sophia, Stoyanov Danail, Grover Patrick, Marcus Hani J
Victor Horsley Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK.
UCL Hawkes Institute, University College London, London, UK.
Neurosurgery. 2025 Apr 30. doi: 10.1227/neu.0000000000003466.
Machine learning (ML) in surgical video analysis offers promising prospects for training and decision support in surgery. The past decade has seen key advances in ML-based operative workflow analysis, though existing applications mostly feature shorter surgeries (<2 hours) with limited scene changes. The aim of this study was to develop and evaluate a ML model capable of automated operative workflow recognition for retrosigmoid vestibular schwannoma (VS) resection. In doing so, this project furthers previous research by applying workflow prediction platforms to lengthy (median >5 hours duration), data-heavy surgeries, using VS resection as an exemplar.
A video dataset of 21 microscopic retrosigmoid VS resections was collected at a single institution over 3 years and underwent workflow annotation according to a previously agreed expert consensus (Approach, Excision, and Closure phases; and Debulking or Dissection steps within the Excision phase). Annotations were used to train a ML model consisting of a convolutional neural network and a recurrent neural network. 5-fold cross-validation was used, and performance metrics (accuracy, precision, recall, F1 score) were assessed for phase and step prediction.
Median operative video time was 5 hours 18 minutes (IQR 3 hours 21 minutes-6 hours 1 minute). The "Tumor Excision" phase accounted for the majority of each case (median 4 hours 23 minutes), whereas "Approach and Exposure" (28 minutes) and "Closure" (17 minutes) comprised shorter phases. The ML model accurately predicted operative phases (accuracy 81%, weighted F1 0.83) and dichotomized steps (accuracy 86%, weighted F1 0.86).
This study demonstrates that our ML model can accurately predict the surgical phases and intraphase steps in retrosigmoid VS resection. This demonstrates the successful application of ML in operative workflow recognition on low-volume, lengthy, data-heavy surgical videos. Despite this, there remains room for improvement in individual step classification. Future applications of ML in low-volume high-complexity operations should prioritize collaborative video sharing to overcome barriers to clinical translation.
外科手术视频分析中的机器学习(ML)为手术培训和决策支持提供了广阔前景。在过去十年中,基于ML的手术工作流程分析取得了重大进展,不过现有应用大多针对手术时间较短(<2小时)且场景变化有限的情况。本研究的目的是开发并评估一种能够自动识别乙状窦后前庭神经鞘瘤(VS)切除术手术工作流程的ML模型。通过将工作流程预测平台应用于冗长(中位时长>5小时)、数据量大的手术,并以VS切除术作为范例,该项目进一步推进了先前的研究。
在3年时间里,于单一机构收集了21例显微镜下乙状窦后VS切除术的视频数据集,并根据先前达成的专家共识进行工作流程注释(入路、切除和闭合阶段;以及切除阶段内的肿瘤减积或解剖步骤)。注释用于训练一个由卷积神经网络和循环神经网络组成的ML模型。采用5折交叉验证,并评估阶段和步骤预测的性能指标(准确率、精确率、召回率、F1分数)。
手术视频中位时长为5小时18分钟(四分位间距3小时21分钟 - 6小时1分钟)。“肿瘤切除”阶段占每个病例的大部分时间(中位时长4小时23分钟),而“入路与暴露”(28分钟)和“闭合”(17分钟)阶段较短。ML模型准确预测了手术阶段(准确率81%,加权F1 0.83)和二分步骤(准确率86%,加权F1 0.86)。
本研究表明,我们的ML模型能够准确预测乙状窦后VS切除术中的手术阶段和阶段内步骤。这证明了ML在小样本、冗长、数据量大的手术视频的手术工作流程识别中的成功应用。尽管如此,单个步骤分类仍有改进空间。ML在小样本高复杂性手术中的未来应用应优先考虑协作视频共享,以克服临床转化的障碍。