Chi Jianning, Lin Geng, Li Zelan, Zhang Wenjun, Chen Jia-Hui, Huang Ying
IEEE J Biomed Health Inform. 2025 Jun;29(6):4186-4199. doi: 10.1109/JBHI.2025.3535541.
Weakly-supervised learning methods have become increasingly attractive for medical image segmentation, but suffered from a high dependence on quantifying the pixel-wise affinities of low-level features, which are easily corrupted in thyroid ultrasound images, resulting in segmentation over-fitting to weakly annotated regions without precise delineation of target boundaries. We propose a dual-branch weakly-supervised learning framework to optimize the backbone segmentation network by calibrating semantic features into rational spatial distribution under the indirect, coarse guidance of the bounding box mask. Specifically, in the spatial arrangement consistency branch, the maximum activations sampled from the preliminary segmentation prediction and the bounding box mask along the horizontal and vertical dimensions are compared to measure the rationality of the approximate target localization. In the hierarchical prediction consistency branch, the target and background prototypes are encapsulated from the semantic features under the combined guidance of the preliminary segmentation prediction and the bounding box mask. The secondary segmentation prediction induced from the prototypes is compared with the preliminary prediction to quantify the rationality of the elaborated target and background semantic feature perception. Experiments on three thyroid datasets illustrate that our model outperforms existing weakly-supervised methods for thyroid gland and nodule segmentation and is comparable to the performance of fully-supervised methods with reduced annotation time. The proposed method has provided a weakly-supervised segmentation strategy by simultaneously considering the target's location and the rationality of target and background semantic features distribution. It can improve the applicability of deep learning based segmentation in the clinical practice.
弱监督学习方法在医学图像分割中越来越具有吸引力,但它高度依赖于对低级特征的像素级亲和力进行量化,而这些特征在甲状腺超声图像中很容易被破坏,导致分割过度拟合到弱标注区域,而无法精确描绘目标边界。我们提出了一种双分支弱监督学习框架,在边界框掩码的间接、粗略指导下,通过将语义特征校准为合理的空间分布来优化主干分割网络。具体来说,在空间排列一致性分支中,比较从初步分割预测和边界框掩码沿水平和垂直维度采样的最大激活值,以衡量近似目标定位的合理性。在分层预测一致性分支中,在初步分割预测和边界框掩码的联合指导下,从语义特征中封装目标和背景原型。将由原型诱导的二次分割预测与初步预测进行比较,以量化详细的目标和背景语义特征感知的合理性。在三个甲状腺数据集上的实验表明,我们的模型在甲状腺和结节分割方面优于现有的弱监督方法,并且在减少标注时间的情况下与全监督方法的性能相当。所提出的方法通过同时考虑目标的位置以及目标和背景语义特征分布的合理性,提供了一种弱监督分割策略。它可以提高基于深度学习的分割在临床实践中的适用性。