文献检索，用中文搜 PubMed

The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .

视觉Transformer（ViT）架构，凭借其基于多头注意力层的注意力机制，因其在处理医学图像信息方面的有效性，已在各种计算机辅助诊断任务中得到广泛应用。ViT因其复杂的架构而备受关注，这种架构需要高性能的GPU或CPU才能在实际医疗诊断设备中进行高效的模型训练和部署。这使得它们比卷积神经网络（CNN）更加复杂。在组织病理学图像分析的背景下，这一难题也具有挑战性，因为组织病理学图像数量有限且复杂。为应对这些挑战，本研究提出了一种TokenMixer混合架构，它结合了CNN和ViT的优势。这种混合架构旨在通过在训练期间最小化使用的输入补丁数量来提高特征提取和分类准确率，同时减少训练时间和参数数量，同时使用卷积层和编码器Transformer层对输入补丁进行令牌化处理，以便在所有网络层中处理补丁，从而实现快速准确的乳腺癌肿瘤亚型分类。TokenMixer机制的灵感来自ConvMixer和TokenLearner模型。首先，ConvMixer模型使用卷积层动态生成空间注意力图，从而能够从输入图像中提取补丁，以最小化训练中使用的输入补丁数量。其次，TokenLearner模型从选定的输入补丁中提取相关区域，对其进行令牌化以改进特征提取，并在编码器Transformer网络中对所有令牌化补丁进行训练。我们在BreakHis公共数据集上评估了TokenMixer模型，并将其与基于ViT的方法和其他最新方法进行了比较。我们的方法在各种放大倍数（40倍、100倍、200倍、400倍）下对乳腺癌亚型的二元分类和多分类都取得了令人印象深刻的结果。该模型在二元分类中的准确率为97.02%，在多分类中的准确率为93.29%，决策时间分别为391.71秒和1173.56秒。这些结果凸显了我们的混合深度ViT-CNN架构在推进组织病理学图像中的肿瘤分类方面的潜力。源代码可访问：https://github.com/abimouloud/TokenMixer 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

推进乳腺癌诊断：用于组织病理学图像更快、更准确分类的令牌视觉变换器

Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.

作者信息

机构信息

出版信息

相似文献

本文引用的文献