School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China.
School of computer Science and Engineering, Beihang University, Beijing, China.
Med Phys. 2022 Sep;49(9):5787-5798. doi: 10.1002/mp.15852. Epub 2022 Jul 30.
PURPOSE: Breast cancer is the most commonly occurring cancer worldwide. The ultrasound reflectivity imaging technique can be used to obtain breast ultrasound (BUS) images, which can be used to classify benign and malignant tumors. However, the classification is subjective and dependent on the experience and skill of operators and doctors. The automatic classification method can assist doctors and improve the objectivity, but current convolution neural network (CNN) is not good at learning global features and vision transformer (ViT) is not good at extraction local features. In this study, we proposed a visual geometry group attention ViT (VGGA-ViT) network to overcome their disadvantages. METHODS: In the proposed method, we used a CNN module to extract the local features and employed a ViT module to learn the global relationship among different regions and enhance the relevant local features. The CNN module was named the VGGA module. It was composed of a VGG backbone, a feature extraction fully connected layer, and a squeeze-and-excitation block. Both the VGG backbone and the ViT module were pretrained on the ImageNet dataset and retrained using BUS samples in this study. Two BUS datasets were employed for validation. RESULTS: Cross-validation was conducted on two BUS datasets. For the Dataset A, the proposed VGGA-ViT network achieved high accuracy (88.71 1.55%), recall (90.73 1.57%), specificity (85.58 3.35%), precision (90.77 1.98%), F1 score (90.73 1.24%), and Matthews correlation coefficient (MCC) (76.34 3.29%), which were better than those of all compared previous networks in this study. The Dataset B was used as a separate test set, the test results showed that the VGGA-ViT had highest accuracy (81.72 2.99%), recall (64.45 2.96%), specificity (90.28 3.51%), precision (77.08 7.21%), F1 score (70.11 4.25%), and MCC (57.64 6.88%). CONCLUSIONS: In this study, we proposed the VGGA-ViT for the BUS classification, which was good at learning both local and global features. The proposed network achieved higher accuracy than the compared previous methods.
目的:乳腺癌是全球最常见的癌症。超声反射成像技术可用于获取乳腺超声(BUS)图像,用于对良性和恶性肿瘤进行分类。然而,分类具有主观性,且依赖于操作人员和医生的经验和技能。自动分类方法可以协助医生并提高客观性,但当前的卷积神经网络(CNN)不擅长学习全局特征,而视觉变换器(ViT)不擅长提取局部特征。在本研究中,我们提出了一种视觉几何群注意力 ViT(VGGA-ViT)网络来克服它们的缺点。
方法:在提出的方法中,我们使用 CNN 模块提取局部特征,并使用 ViT 模块学习不同区域之间的全局关系并增强相关的局部特征。CNN 模块命名为 VGGA 模块。它由 VGG 骨干网、特征提取全连接层和挤压激励块组成。VGG 骨干网和 ViT 模块均在 ImageNet 数据集上进行预训练,并在本研究中使用 BUS 样本进行再训练。使用了两个 BUS 数据集进行验证。
结果:在两个 BUS 数据集上进行了交叉验证。对于数据集 A,所提出的 VGGA-ViT 网络实现了较高的准确率(88.71 1.55%)、召回率(90.73 1.57%)、特异性(85.58 3.35%)、精度(90.77 1.98%)、F1 分数(90.73 1.24%)和马修斯相关系数(MCC)(76.34 3.29%),优于本研究中所有比较的先前网络。数据集 B 被用作单独的测试集,测试结果表明 VGGA-ViT 具有最高的准确率(81.72 2.99%)、召回率(64.45 2.96%)、特异性(90.28 3.51%)、精度(77.08 7.21%)、F1 分数(70.11 4.25%)和 MCC(57.64 6.88%)。
结论:在本研究中,我们提出了用于 BUS 分类的 VGGA-ViT,它擅长学习局部和全局特征。所提出的网络比比较的先前方法具有更高的准确性。
J Imaging Inform Med. 2025-4-8
J Imaging Inform Med. 2025-2-19
Breast Cancer Res Treat. 2025-4
Med Image Comput Comput Assist Interv. 2023-10
Diagnostics (Basel). 2023-11-27
Bioengineering (Basel). 2023-10-19