Yuan Feiniu, Peng Yuhuan, Huang Qinghua, Li Xuelong
IEEE Trans Image Process. 2024;33:6340-6353. doi: 10.1109/TIP.2024.3482864. Epub 2024 Nov 8.
It is quite challenging to visually identify skin lesions with irregular shapes, blurred boundaries and large scale variances. Convolutional Neural Network (CNN) extracts more local features with abundant spatial information, while Transformer has the powerful ability to capture more global information but with insufficient spatial details. To overcome the difficulties in discriminating small or blurred skin lesions, we propose a Bi-directionally Fused Boundary Aware Network (BiFBA-Net). To utilize complementary features produced by CNNs and Transformers, we design a dual-encoding structure. Different from existing dual-encoders, our method designs a Bi-directional Attention Gate (Bi-AG) with two inputs and two outputs for crosswise feature fusion. Our Bi-AG accepts two kinds of features from CNN and Transformer encoders, and two attention gates are designed to generate two attention outputs that are sent back to the two encoders. Thus, we implement adequate exchanging of multi-scale information between CNN and Transformer encoders in a bi-directional and attention way. To perfectly restore feature maps, we propose a progressive decoding structure with boundary aware, containing three decoders with six supervised losses. The first decoder is a CNN network for producing more spatial details. The second one is a Partial Decoder (PD) for aggregating high-level features with more semantics. The last one is a Boundary Aware Decoder (BAD) proposed to progressively improve boundary accuracy. Our BAD uses residual structure and Reverse Attention (RA) at different scales to deeply mine structural and spatial details for refining lesion boundaries. Extensive experiments on public datasets show that our BiFBA-Net achieves higher segmentation accuracy, and has much better ability of boundary perceptions than compared methods. It also alleviates both over-segmentation of small lesions and under-segmentation of large ones.
视觉识别形状不规则、边界模糊且尺度差异较大的皮肤病变颇具挑战性。卷积神经网络(CNN)能提取更多具有丰富空间信息的局部特征,而Transformer具有强大的捕捉更多全局信息的能力,但空间细节不足。为克服辨别小的或模糊的皮肤病变的困难,我们提出了一种双向融合边界感知网络(BiFBA-Net)。为利用CNN和Transformer产生的互补特征,我们设计了一种双编码结构。与现有的双编码器不同,我们的方法设计了一种具有两个输入和两个输出的双向注意力门(Bi-AG)用于交叉特征融合。我们的Bi-AG接收来自CNN和Transformer编码器的两种特征,并设计了两个注意力门以生成两个注意力输出,这两个输出会反馈回两个编码器。这样,我们以双向和注意力的方式在CNN和Transformer编码器之间实现了多尺度信息的充分交换。为完美恢复特征图,我们提出了一种具有边界感知的渐进解码结构,它包含三个带有六个监督损失的解码器。第一个解码器是一个用于产生更多空间细节的CNN网络。第二个是用于聚合具有更多语义的高级特征的部分解码器(PD)。最后一个是为逐步提高边界精度而提出的边界感知解码器(BAD)。我们的BAD在不同尺度上使用残差结构和反向注意力(RA)来深度挖掘结构和空间细节以细化病变边界。在公共数据集上进行的大量实验表明,我们的BiFBA-Net实现了更高的分割精度,并且与对比方法相比具有更好的边界感知能力。它还缓解了小病变的过分割和大病变的欠分割问题。