Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:480-483. doi: 10.1109/EMBC48229.2022.9871809.
Medical ultrasound (US) imaging has become a prominent modality for breast cancer imaging due to its ease of use, low cost, and safety. In the past decade, convolutional neural networks (CNNs) have emerged as the method of choice in vision applications and have shown excellent potential in the automatic classification of US images. Despite their success, their restricted local receptive field limits their ability to learn global context information. Recently, Vision Transformer (ViT) designs, based on self-attention between image patches, have shown great potential to be an alternative to CNNs. In this study, for the first time, we utilize ViT to classify breast US images using different augmentation strategies. We also adopted a weighted cross-entropy loss function since breast ultrasound datasets are often imbalanced. The results are provided as classification accuracy and Area Under the Curve (AUC) metrics, and the performance is compared with the SOTA CNNs. The results indicate that the ViT models have comparable efficiency with or even better than the CNNs in the classification of US breast images. Clinical relevance- This work shows the potential of Vision Transformers in the automatic classification of masses in breast ultrasound, which helps clinicians diagnose and make treatment decisions more precisely.
医学超声(US)成像因其易用性、低成本和安全性已成为乳腺癌成像的主要方式。在过去的十年中,卷积神经网络(CNNs)已成为视觉应用中的首选方法,并在 US 图像的自动分类中显示出了优异的潜力。尽管它们取得了成功,但它们受限的局部感受野限制了它们学习全局上下文信息的能力。最近,基于图像补丁之间的自注意力的 Vision Transformer(ViT)设计已显示出替代 CNN 的巨大潜力。在这项研究中,我们首次利用 ViT 使用不同的增强策略对乳腺 US 图像进行分类。由于乳腺超声数据集通常不平衡,我们还采用了加权交叉熵损失函数。结果以分类准确率和 AUC 度量值提供,并与 SOTA CNN 进行比较。结果表明,在 US 乳腺图像的分类中,ViT 模型的效率与 CNN 相当,甚至更好。临床相关性-这项工作表明 Vision Transformers 在乳腺超声中自动分类肿块方面具有潜力,这有助于临床医生更准确地诊断和做出治疗决策。