IEEE J Biomed Health Inform. 2021 Jul;25(7):2665-2672. doi: 10.1109/JBHI.2020.3038847. Epub 2021 Jul 27.
Anatomical image segmentation is one of the foundations for medical planning. Recently, convolutional neural networks (CNN) have achieved much success in segmenting volumetric (3D) images when a large number of fully annotated 3D samples are available. However, rarely a volumetric medical image dataset containing a sufficient number of segmented 3D images is accessible since providing manual segmentation masks is monotonous and time-consuming. Thus, to alleviate the burden of manual annotation, we attempt to effectively train a 3D CNN using a sparse annotation where ground truth on just one 2D slice of the axial axis of each training 3D image is available. To tackle this problem, we propose a self-training framework that alternates between two steps consisting of assigning pseudo annotations to unlabeled voxels and updating the 3D segmentation network by employing both the labeled and pseudo labeled voxels. To produce pseudo labels more accurately, we benefit from both propagation of labels (or pseudo-labels) between adjacent slices and 3D processing of voxels. More precisely, a 2D registration-based method is proposed to gradually propagate labels between consecutive 2D slices and a 3D U-Net is employed to utilize volumetric information. Ablation studies on benchmarks show that cooperation between the 2D registration and the 3D segmentation provides accurate pseudo-labels that enable the segmentation network to be trained effectively when for each training sample only even one segmented slice by an expert is available. Our method is assessed on the CHAOS and Visceral datasets to segment abdominal organs. Results demonstrate that despite utilizing just one segmented slice for each 3D image (that is weaker supervision in comparison with the compared weakly supervised methods) can result in higher performance and also achieve closer results to the fully supervised manner.
解剖图像分割是医学规划的基础之一。当有大量完全标注的 3D 样本可用时,最近的卷积神经网络(CNN)在分割体积(3D)图像方面取得了很大的成功。然而,由于提供手动分割掩模单调且耗时,很少有包含足够数量分割 3D 图像的体积医学图像数据集可用。因此,为了减轻手动注释的负担,我们尝试使用稀疏注释来有效地训练 3D CNN,其中每个训练 3D 图像的轴向轴的一个 2D 切片上提供真实标签。为了解决这个问题,我们提出了一种自训练框架,该框架交替进行两个步骤,包括为未标记的体素分配伪标签和通过使用标记和伪标记的体素来更新 3D 分割网络。为了更准确地生成伪标签,我们受益于标签(或伪标签)在相邻切片之间的传播和体素的 3D 处理。更准确地说,提出了一种基于 2D 配准的方法来在连续的 2D 切片之间逐步传播标签,并且使用 3D U-Net 来利用体积信息。在基准上的消融研究表明,2D 配准和 3D 分割之间的合作提供了准确的伪标签,使得分割网络能够在每个训练样本仅由专家提供一个分割切片的情况下有效训练。我们的方法在 CHAOS 和 Visceral 数据集上进行了评估,以分割腹部器官。结果表明,尽管每个 3D 图像仅使用一个分割切片(与比较的弱监督方法相比,监督较弱),但可以获得更高的性能,并且还可以接近完全监督的方式。