Wang Shengxiang, Li Ge, Gao Min, Zhuo Linlin, Liu Mingzhe, Ma Zhizhong, Zhao Wei, Fu Xiangzheng
School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China.
Department of Radiology, Xiangya Hospital, Central South University, Changsha, China.
NPJ Digit Med. 2025 Jul 10;8(1):426. doi: 10.1038/s41746-025-01829-2.
Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet .
医学图像分割对于准确诊断至关重要。虽然基于U-Net的模型很有效,但它们在捕捉复杂解剖结构中的长距离依赖性方面存在困难。我们提出了GH-UNet,这是一种在U-Net框架内的分组混合卷积-ViT模型,以解决这一局限性。GH-UNet集成了一个用于局部细节和全局上下文建模的混合卷积-Transformer编码器、一个用于自适应特征加权的分组动态门控(GDG)模块以及一个用于多尺度集成的级联解码器。编码器和GDG都是模块化的,能够与各种CNN或ViT主干兼容。在五个公共数据集和一个私有数据集上进行的大量实验表明,GH-UNet始终能实现卓越的性能。在ISIC2016数据集上,它仅使用38%的参数和49.61%的FLOP,在DICE和IOU上分别比H2Former提高了1.37%和1.94%。代码可通过以下链接免费获取:https://github.com/xiachashuanghua/GH-UNet 。