Gerats Beerend G A, Wolterink Jelmer M, Broeders Ivo A M J
Surgery Department, Meander Medical Center, Maatweg 3, Amersfoort, 3813 TZ, Utrecht, The Netherlands.
Robotics and Mechatronics, University of Twente, Drienerlolaan 5, Enschede, 7522 NB, Overijssel, The Netherlands.
Surg Endosc. 2025 Sep;39(9):5948-5956. doi: 10.1007/s00464-025-12031-6. Epub 2025 Aug 6.
Efficient operating room (OR) workflows have the potential to reduce delays and cancellations, shorten patient waiting lists, and improve satisfaction among patients and staff. Insights for OR efficiency can be extracted from the registration and timing of workflow steps. However, manual registration of these steps is often unreliable. Therefore, we propose to recognize the OR workflow automatically in videos from overhead depth cameras using deep learning. In contrast to regular cameras, depth cameras do not capture fine video details that permit identification of the people recorded. Hence, the privacy of patients and staff is preserved.
We gathered a video dataset of 21 laparoscopic surgeries captured by three depth cameras positioned in different corners of the OR. The procedures were annotated with four phases describing the OR workflow, i.e., turnover, anesthesia, surgery, and wrap-up. We performed an extensive analysis with spatial and temporal deep learning models, including a comparison between multi- and single-view camera setups, and contrasting post-operative with real-time predictions. Along with standard metrics for workflow recognition, we introduce a new evaluation metric that reflects the error in estimated phase duration.
The best-performing model, ASFormer, recognized operative phases with 99.7% mean average precision (mAP), enabling the estimation of phase duration with a mean absolute error of 35 seconds. The best-performing spatial model resulted in 89.7% mAP, indicating the importance of temporal modeling. We also found that the three cameras could be replaced by a single camera, with 98.8% mAP, although performance depends on the camera location in the OR. Additionally, we found that real-time prediction is feasible but underperforms with respect to post-operative analysis (94.3% mAP).
Automated OR workflow recognition is possible using existing deep learning techniques based on single- and multi-camera setups. The use of privacy-preserving depth videos and a reasonably low phase duration estimation error could have positive implications for practical use.
高效的手术室(OR)工作流程有潜力减少延误和取消手术的情况,缩短患者等待名单,并提高患者和工作人员的满意度。可以从工作流程步骤的登记和时间安排中获取有关手术室效率的见解。然而,这些步骤的手动登记往往不可靠。因此,我们建议使用深度学习从头顶深度相机拍摄的视频中自动识别手术室工作流程。与普通相机不同,深度相机不会捕捉到能够识别被记录人员的精细视频细节。因此,患者和工作人员的隐私得到了保护。
我们收集了一个视频数据集,其中包含由放置在手术室不同角落的三个深度相机拍摄的21台腹腔镜手术。这些手术被标注了四个描述手术室工作流程的阶段,即周转、麻醉、手术和收尾。我们使用空间和时间深度学习模型进行了广泛的分析,包括多视图和单视图相机设置之间的比较,以及术后预测与实时预测的对比。除了用于工作流程识别的标准指标外,我们还引入了一个新的评估指标,该指标反映了估计阶段持续时间的误差。
表现最佳 的模型ASFormer识别手术阶段的平均精度(mAP)为99.7%,能够以平均绝对误差35秒估计阶段持续时间。表现最佳的空间模型的mAP为89.7%,这表明时间建模很重要。我们还发现,虽然性能取决于相机在手术室中的位置,但三个相机可以用一个相机代替,mAP为98.8%。此外,我们发现实时预测是可行的,但在术后分析方面表现较差(mAP为94.3%)。
使用基于单相机和多相机设置的现有深度学习技术,可以实现手术室工作流程的自动识别。使用保护隐私的深度视频以及合理较低的阶段持续时间估计误差可能对实际应用产生积极影响。