Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, No.800 Dongchuan Road, Shanghai 200240, China.
Key Laboratory of Biomechanics and Mechanobiology (Beihang University) of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
Med Image Anal. 2022 Jan;75:102258. doi: 10.1016/j.media.2021.102258. Epub 2021 Oct 10.
In this paper, we address the problem of fully automatic labeling and segmentation of 3D vertebrae in arbitrary Field-Of-View (FOV) CT images. We propose a deep learning-based two-stage solution to tackle these two problems. More specifically, in the first stage, the challenging vertebra labeling problem is solved via a novel transformers-based 3D object detector that views automatic detection of vertebrae in arbitrary FOV CT scans as a one-to-one set prediction problem. The main components of the new method, called Spine-Transformers, are a one-to-one set based global loss that forces unique predictions and a light-weighted 3D transformer architecture equipped with a skip connection and learnable positional embeddings for encoder and decoder, respectively. We additionally propose an inscribed sphere-based object detector to replace the regular box-based object detector for a better handling of volume orientation variations. Our method reasons about the relationships of different levels of vertebrae and the global volume context to directly infer all vertebrae in parallel. In the second stage, the segmentation of the identified vertebrae and the refinement of the detected centers are then done by training one single multi-task encoder-decoder network for all vertebrae as the network does not need to identify which vertebra it is working on. The two tasks share a common encoder path but with different decoder paths. Comprehensive experiments are conducted on two public datasets and one in-house dataset. The experimental results demonstrate the efficacy of the present approach.
在本文中,我们解决了在任意视野(FOV)CT 图像中全自动标记和分割 3D 椎体的问题。我们提出了一种基于深度学习的两阶段解决方案来解决这两个问题。具体来说,在第一阶段,通过一种新颖的基于变压器的 3D 目标检测器来解决具有挑战性的椎体标记问题,该检测器将任意 FOV CT 扫描中的椎体自动检测视为一对一的集合预测问题。这种新方法的主要组件称为 Spine-Transformers,包括基于一对一的全局损失,该损失迫使进行唯一的预测,以及一个轻量级的 3D 变压器架构,该架构配备了一个跳过连接和可学习的位置嵌入,分别用于编码器和解码器。我们还提出了一种基于内切球的目标检测器来替代常规的基于框的目标检测器,以更好地处理体积方向变化。我们的方法可以推理不同层次的椎体和全局体积上下文之间的关系,以便直接并行推断所有椎体。在第二阶段,通过对所有椎体进行单一的多任务编码器-解码器网络进行训练,来分割识别出的椎体并细化检测到的中心。由于网络不需要识别它正在处理哪个椎体,因此两个任务共享一个公共编码器路径,但具有不同的解码器路径。在两个公共数据集和一个内部数据集上进行了全面的实验。实验结果证明了该方法的有效性。