Mao Zhehua, Das Adrito, Khan Danyal Z, Williams Simon C, Hanrahan John G, Stoyanov Danail, Marcus Hani J, Bano Sophia
Department of Computer Science, University College London, London, UK.
UCL Hawkes Institute, University College London, London, UK.
Int J Comput Assist Radiol Surg. 2025 Apr 29. doi: 10.1007/s11548-025-03369-2.
Automated localization of critical anatomical structures in endoscopic pituitary surgery is crucial for enhancing patient safety and surgical outcomes. While deep learning models have shown promise in this task, their predictions often suffer from frame-to-frame inconsistency. This study addresses this issue by proposing ConsisTNet, a novel spatio-temporal model designed to improve prediction stability.
ConsisTNet leverages spatio-temporal features extracted from consecutive frames to provide both temporally and spatially consistent predictions, addressing the limitations of single-frame approaches. We employ a semi-supervised strategy, utilizing ground-truth label tracking for pseudo-label generation through label propagation. Consistency is assessed by comparing predictions across consecutive frames using predicted label tracking. The model is optimized and accelerated using TensorRT for real-time intraoperative guidance.
Compared to previous state-of-the-art models, ConsisTNet significantly improves prediction consistency across video frames while maintaining high accuracy in segmentation and landmark detection. Specifically, segmentation consistency is improved by 4.56 and 9.45% in IoU for the two segmentation regions, and landmark detection consistency is enhanced with a 43.86% reduction in mean distance error. The accelerated model achieves an inference speed of 202 frames per second (FPS) with 16-bit floating point (FP16) precision, enabling real-time intraoperative guidance.
ConsisTNet demonstrates significant improvements in spatio-temporal consistency of anatomical localization during endoscopic pituitary surgery, providing more stable and reliable real-time surgical assistance.
在内镜垂体手术中对关键解剖结构进行自动定位对于提高患者安全性和手术效果至关重要。虽然深度学习模型在这项任务中显示出了前景,但其预测往往存在帧间不一致的问题。本研究通过提出ConsisTNet来解决这一问题,ConsisTNet是一种旨在提高预测稳定性的新型时空模型。
ConsisTNet利用从连续帧中提取的时空特征来提供时间和空间上一致的预测,解决了单帧方法的局限性。我们采用半监督策略,通过标签传播利用真实标签跟踪来生成伪标签。通过使用预测标签跟踪比较连续帧的预测来评估一致性。该模型使用TensorRT进行优化和加速,以实现实时术中指导。
与先前的最先进模型相比,ConsisTNet在显著提高视频帧间预测一致性的同时,保持了分割和地标检测的高精度。具体而言,两个分割区域的交并比(IoU)分割一致性分别提高了4.56%和9.45%,地标检测一致性得到增强,平均距离误差降低了43.86%。加速后的模型在16位浮点(FP16)精度下实现了每秒202帧(FPS)的推理速度,能够进行实时术中指导。
ConsisTNet在内镜垂体手术期间的解剖定位时空一致性方面取得了显著改善,提供了更稳定、可靠的实时手术辅助。