School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China; Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
Med Image Anal. 2024 Oct;97:103288. doi: 10.1016/j.media.2024.103288. Epub 2024 Jul 29.
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
自动内窥镜图像中的息肉分割对于结直肠癌的早期诊断至关重要。尽管有功能强大的分割模型可用,但仍有两个挑战阻碍了息肉分割算法的准确性。首先,在结肠镜检查过程中,医生经常调整结肠镜尖端的方向以捕获潜在的病变,从而导致结肠镜图像的视角发生变化。这些变化增加了息肉视觉外观的多样性,给学习健壮的息肉特征带来了挑战。其次,息肉通常表现出与周围组织相似的特性,导致息肉边界不明显。为了解决这些问题,我们提出了一个名为 VANet 的视角感知框架,用于精确的息肉分割。在 VANet 中,息肉被强调为一个有区别的特征,因此可以通过类激活图在视角分类过程中定位。有了这些息肉的位置,我们设计了一个视角感知的 Transformer(VAFormer)来减轻周围组织对注意力的侵蚀,从而诱导更好的息肉表示。此外,为了增强网络对息肉边界的感知能力,我们开发了一个边界感知的 Transformer(BAFormer),以鼓励自我注意力集中在不确定的区域。因此,这两个模块的结合能够校准预测,并显著提高息肉分割性能。在六个指标的七个公共数据集上的广泛实验表明了我们方法的最先进的结果,并且 VANet 可以有效地处理实际场景中的结肠镜图像。源代码可在 https://github.com/1024803482/Viewpoint-Aware-Network 上获得。