Wang Wenhan, Zhou Jiale, Zhao Jin, Lin Xun, Zhang Yan, Lu Shan, Zhao Wanchen, Wang Shuai, Tang Wenzhong, Qu Xiaolei
School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China.
Breast and Thyroid Surgery, China-Japan Friendship Hospital, Beijing, China.
Ultrasound Med Biol. 2025 Mar;51(3):525-534. doi: 10.1016/j.ultrasmedbio.2024.11.014. Epub 2024 Dec 20.
Breast ultrasound (BUS) is used to classify benign and malignant breast tumors, and its automatic classification can reduce subjectivity. However, current convolutional neural networks (CNNs) face challenges in capturing global features, while vision transformer (ViT) networks have limitations in effectively extracting local features. Therefore, this study aimed to develop a deep learning method that enables the interaction and updating of intermediate features between CNN and ViT to achieve high-accuracy BUS image classification.
This study introduced the CNN and transformer multi-stage fusion network (CTMF-Net) consisting of two branches: a CNN branch and a transformer branch. The CNN branch employs visual geometry group as its backbone, while the transformer branch utilizes ViT as its base network. Both branches were divided into four stages. At the end of each stage, a proposed feature interaction module facilitated feature interaction and fusion between the two branches. Additionally, the convolutional block attention module was employed to enhance relevant features after each stage of the CNN branch. Extensive experiments were conducted using various state-of-the-art deep-learning classification methods on three public breast ultrasound datasets (SYSU, UDIAT and BUSI).
For the internal validation on SYSU and UDIAT, our proposed method CTMF-Net achieved the highest accuracy of 90.14 ± 0.58% on SYSU and 92.04 ± 4.90% on UDIAT, which showed superior classification performance over other state-of-art networks (p < 0.05). Additionally, for external validation on BUSI, CTMF-Net showed outstanding performance, achieving the highest area under the curve score of 0.8704 when trained on SYSU, marking a 0.0126 improvement over the second-best visual geometry group attention ViT method. Similarly, when applied to UDIAT, CTMF-Net achieved an area under the curve score of 0.8505, surpassing the second-best global context ViT method by 0.0130.
Our proposed method, CTMF-Net, outperforms all existing methods and can effectively assist doctors in achieving more accurate classification performance of breast tumors.
乳腺超声(BUS)用于对乳腺肿瘤的良恶性进行分类,其自动分类可减少主观性。然而,当前的卷积神经网络(CNN)在捕捉全局特征方面面临挑战,而视觉Transformer(ViT)网络在有效提取局部特征方面存在局限性。因此,本研究旨在开发一种深度学习方法,实现CNN和ViT之间中间特征的交互与更新,以实现高精度的BUS图像分类。
本研究引入了由两个分支组成的CNN与Transformer多阶段融合网络(CTMF-Net):一个CNN分支和一个Transformer分支。CNN分支采用视觉几何组作为其主干,而Transformer分支以ViT作为其基础网络。两个分支均分为四个阶段。在每个阶段结束时,一个提出的特征交互模块促进两个分支之间的特征交互与融合。此外,在CNN分支的每个阶段之后使用卷积块注意力模块来增强相关特征。在三个公共乳腺超声数据集(SYSU、UDIAT和BUSI)上使用各种最先进的深度学习分类方法进行了广泛的实验。
对于SYSU和UDIAT的内部验证,我们提出的方法CTMF-Net在SYSU上实现了90.14±0.58%的最高准确率,在UDIAT上实现了92.04±4.90%的最高准确率,这表明其分类性能优于其他最先进的网络(p<0.05)。此外,对于BUSI的外部验证,CTMF-Net表现出色,在SYSU上训练时实现了0.8704的最高曲线下面积分数,比第二好的视觉几何组注意力ViT方法提高了0.0126。同样,当应用于UDIAT时,CTMF-Net实现了0.8505的曲线下面积分数,比第二好的全局上下文ViT方法高出0.0130。
我们提出的方法CTMF-Net优于所有现有方法,能够有效地帮助医生实现更准确的乳腺肿瘤分类性能。