用于手术视频中视频语义分割的时空网络。

A spatio-temporal network for video semantic segmentation in surgical videos.

机构信息

Medtronic plc, London, UK.

Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.

出版信息

Int J Comput Assist Radiol Surg. 2024 Feb;19(2):375-382. doi: 10.1007/s11548-023-02971-6. Epub 2023 Jun 22.

DOI:10.1007/s11548-023-02971-6

PMID:37347345

Abstract

PURPOSE

Semantic segmentation in surgical videos has applications in intra-operative guidance, post-operative analytics and surgical education. Models need to provide accurate predictions since temporally inconsistent identification of anatomy can hinder patient safety. We propose a novel architecture for modelling temporal relationships in videos to address these issues.

METHODS

We developed a temporal segmentation model that includes a static encoder and a spatio-temporal decoder. The encoder processes individual frames whilst the decoder learns spatio-temporal relationships from frame sequences. The decoder can be used with any suitable encoder to improve temporal consistency.

RESULTS

Model performance was evaluated on the CholecSeg8k dataset and a private dataset of robotic Partial Nephrectomy procedures. Mean Intersection over Union improved by 1.30% and 4.27% respectively for each dataset when the temporal decoder was applied. Our model also displayed improvements in temporal consistency up to 7.23%.

CONCLUSIONS

This work demonstrates an advance in video segmentation of surgical scenes with potential applications in surgery with a view to improve patient outcomes. The proposed decoder can extend state-of-the-art static models, and it is shown that it can improve per-frame segmentation output and video temporal consistency.

摘要

目的

手术视频中的语义分割在术中指导、术后分析和手术教育中具有应用价值。由于对解剖结构的时间不一致识别可能会妨碍患者安全，因此模型需要提供准确的预测。我们提出了一种新的架构来解决这些问题，用于对视频中的时间关系进行建模。

方法

我们开发了一种时间分割模型，该模型包括静态编码器和时空解码器。编码器处理单个帧，而解码器则从帧序列中学习时空关系。该解码器可以与任何合适的编码器一起使用，以提高时间一致性。

结果

在 CholecSeg8k 数据集和机器人部分肾切除术的私人数据集上评估了模型性能。当应用时间解码器时，每个数据集的平均交并比分别提高了 1.30%和 4.27%。我们的模型还显示出在时间一致性方面提高了 7.23%。

结论

这项工作展示了在手术场景视频分割方面的进展，具有改善患者预后的手术应用潜力。所提出的解码器可以扩展最新的静态模型，并且已经表明它可以提高每帧分割输出和视频时间一致性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于手术视频中视频语义分割的时空网络。

A spatio-temporal network for video semantic segmentation in surgical videos.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

用于手术视频中视频语义分割的时空网络。

A spatio-temporal network for video semantic segmentation in surgical videos.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献