Tian Ye, Zhu Jingqiang, Zhang Lei, Mou Lichao, Zhu Xiaoxiang, Shi Yilei, Ma Buyun, Zhao Wanjun
Department of Ultrasonography, West China Hospital of Sichuan University.
Department of Thyroid Surgery, West China Hospital of Sichuan University.
J Vis Exp. 2023 Apr 21(194). doi: 10.3791/64480.
In recent years, the incidence of thyroid cancer has been increasing. Thyroid nodule detection is critical for both the detection and treatment of thyroid cancer. Convolutional neural networks (CNNs) have achieved good results in thyroid ultrasound image analysis tasks. However, due to the limited valid receptive field of convolutional layers, CNNs fail to capture long-range contextual dependencies, which are important for identifying thyroid nodules in ultrasound images. Transformer networks are effective in capturing long-range contextual information. Inspired by this, we propose a novel thyroid nodule detection method that combines the Swin Transformer backbone and Faster R-CNN. Specifically, an ultrasound image is first projected into a 1D sequence of embeddings, which are then fed into a hierarchical Swin Transformer. The Swin Transformer backbone extracts features at five different scales by utilizing shifted windows for the computation of self-attention. Subsequently, a feature pyramid network (FPN) is used to fuse the features from different scales. Finally, a detection head is used to predict bounding boxes and the corresponding confidence scores. Data collected from 2,680 patients were used to conduct the experiments, and the results showed that this method achieved the best mAP score of 44.8%, outperforming CNN-based baselines. In addition, we gained better sensitivity (90.5%) than the competitors. This indicates that context modeling in this model is effective for thyroid nodule detection.
近年来,甲状腺癌的发病率一直在上升。甲状腺结节检测对于甲状腺癌的检测和治疗都至关重要。卷积神经网络(CNN)在甲状腺超声图像分析任务中取得了良好的效果。然而,由于卷积层的有效感受野有限,CNN无法捕捉到对超声图像中甲状腺结节识别很重要的长距离上下文依赖关系。Transformer网络在捕捉长距离上下文信息方面很有效。受此启发,我们提出了一种将Swin Transformer主干与Faster R-CNN相结合的新型甲状腺结节检测方法。具体来说,首先将超声图像投影到一维嵌入序列中,然后将其输入到分层的Swin Transformer中。Swin Transformer主干通过利用移位窗口来计算自注意力,在五个不同尺度上提取特征。随后,使用特征金字塔网络(FPN)来融合不同尺度的特征。最后,使用检测头来预测边界框和相应的置信度分数。从2680名患者收集的数据用于进行实验,结果表明该方法实现了44.8%的最佳平均精度均值(mAP)分数,优于基于CNN的基线方法。此外,我们获得了比竞争对手更好的灵敏度(90.5%)。这表明该模型中的上下文建模对于甲状腺结节检测是有效的。