Suppr超能文献

使用视觉Transformer和Swin Transformer对基于移动设备的口腔癌图像进行分类

Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer.

作者信息

Song Bofan, Kc Dharma Raj, Yang Rubin Yuchan, Li Shaobai, Zhang Chicheng, Liang Rongguang

机构信息

Wyant College of Optical Sciences, The University of Arizona, Tucson, AZ 85721, USA.

Computer Science Department, The University of Arizona, Tucson, AZ 85721, USA.

出版信息

Cancers (Basel). 2024 Feb 29;16(5):987. doi: 10.3390/cancers16050987.

Abstract

Oral cancer, a pervasive and rapidly growing malignant disease, poses a significant global health concern. Early and accurate diagnosis is pivotal for improving patient outcomes. Automatic diagnosis methods based on artificial intelligence have shown promising results in the oral cancer field, but the accuracy still needs to be improved for realistic diagnostic scenarios. Vision Transformers (ViT) have outperformed learning CNN models recently in many computer vision benchmark tasks. This study explores the effectiveness of the Vision Transformer and the Swin Transformer, two cutting-edge variants of the transformer architecture, for the mobile-based oral cancer image classification application. The pre-trained Swin transformer model achieved 88.7% accuracy in the binary classification task, outperforming the ViT model by 2.3%, while the conventional convolutional network model VGG19 and ResNet50 achieved 85.2% and 84.5% accuracy. Our experiments demonstrate that these transformer-based architectures outperform traditional convolutional neural networks in terms of oral cancer image classification, and underscore the potential of the ViT and the Swin Transformer in advancing the state of the art in oral cancer image analysis.

摘要

口腔癌是一种普遍且迅速增长的恶性疾病,是全球重大的健康问题。早期准确诊断对于改善患者预后至关重要。基于人工智能的自动诊断方法在口腔癌领域已显示出有前景的结果,但对于实际诊断场景,准确性仍需提高。视觉Transformer(ViT)最近在许多计算机视觉基准任务中表现优于卷积神经网络(CNN)模型。本研究探索了Transformer架构的两种前沿变体——视觉Transformer和Swin Transformer在基于移动设备的口腔癌图像分类应用中的有效性。预训练的Swin Transformer模型在二分类任务中达到了88.7%的准确率,比ViT模型高出2.3%,而传统卷积网络模型VGG19和ResNet50的准确率分别为85.2%和84.5%。我们的实验表明,这些基于Transformer的架构在口腔癌图像分类方面优于传统卷积神经网络,并强调了ViT和Swin Transformer在推动口腔癌图像分析技术发展方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6784/10931180/d1932e867de1/cancers-16-00987-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验