IEEE Trans Med Imaging. 2024 Apr;43(4):1308-1322. doi: 10.1109/TMI.2023.3335406. Epub 2024 Apr 3.
Surgical scene segmentation is a critical task in Robotic-assisted surgery. However, the complexity of the surgical scene, which mainly includes local feature similarity (e.g., between different anatomical tissues), intraoperative complex artifacts, and indistinguishable boundaries, poses significant challenges to accurate segmentation. To tackle these problems, we propose the Long Strip Kernel Attention network (LSKANet), including two well-designed modules named Dual-block Large Kernel Attention module (DLKA) and Multiscale Affinity Feature Fusion module (MAFF), which can implement precise segmentation of surgical images. Specifically, by introducing strip convolutions with different topologies (cascaded and parallel) in two blocks and a large kernel design, DLKA can make full use of region- and strip-like surgical features and extract both visual and structural information to reduce the false segmentation caused by local feature similarity. In MAFF, affinity matrices calculated from multiscale feature maps are applied as feature fusion weights, which helps to address the interference of artifacts by suppressing the activations of irrelevant regions. Besides, the hybrid loss with Boundary Guided Head (BGH) is proposed to help the network segment indistinguishable boundaries effectively. We evaluate the proposed LSKANet on three datasets with different surgical scenes. The experimental results show that our method achieves new state-of-the-art results on all three datasets with improvements of 2.6%, 1.4%, and 3.4% mIoU, respectively. Furthermore, our method is compatible with different backbones and can significantly increase their segmentation accuracy. Code is available at https://github.com/YubinHan73/LSKANet.
手术场景分割是机器人辅助手术中的关键任务。然而,手术场景的复杂性主要包括局部特征相似性(例如,不同解剖组织之间)、术中复杂伪影和难以区分的边界,这给准确分割带来了重大挑战。为了解决这些问题,我们提出了长带核注意力网络(LSKANet),包括两个精心设计的模块,分别是双块大核注意力模块(DLKA)和多尺度亲和特征融合模块(MAFF),可以实现手术图像的精确分割。具体来说,通过在两个块中引入具有不同拓扑结构(级联和并行)的带卷积和大核设计,DLKA 可以充分利用区域和带状手术特征,并提取视觉和结构信息,以减少局部特征相似性引起的错误分割。在 MAFF 中,从多尺度特征图计算的亲和矩阵被用作特征融合权重,有助于通过抑制不相关区域的激活来消除伪影的干扰。此外,还提出了具有边界引导头(BGH)的混合损失,以帮助网络有效地分割难以区分的边界。我们在具有不同手术场景的三个数据集上评估了所提出的 LSKANet。实验结果表明,我们的方法在所有三个数据集上都取得了新的最先进的结果,分别提高了 2.6%、1.4%和 3.4%的 mIoU。此外,我们的方法与不同的骨干网络兼容,可以显著提高它们的分割精度。代码可在 https://github.com/YubinHan73/LSKANet 上获得。