Pacal Ishak, Alaftekin Melek, Zengul Ferhat Devrim
Department of Computer Engineering, Igdir University, 76000, Igdir, Turkey.
Department of Health Services Administration, The University of Alabama at Birmingham, Birmingham, AL, USA.
J Imaging Inform Med. 2024 Dec;37(6):3174-3192. doi: 10.1007/s10278-024-01140-8. Epub 2024 Jun 5.
Skin cancer is one of the most frequently occurring cancers worldwide, and early detection is crucial for effective treatment. Dermatologists often face challenges such as heavy data demands, potential human errors, and strict time limits, which can negatively affect diagnostic outcomes. Deep learning-based diagnostic systems offer quick, accurate testing and enhanced research capabilities, providing significant support to dermatologists. In this study, we enhanced the Swin Transformer architecture by implementing the hybrid shifted window-based multi-head self-attention (HSW-MSA) in place of the conventional shifted window-based multi-head self-attention (SW-MSA). This adjustment enables the model to more efficiently process areas of skin cancer overlap, capture finer details, and manage long-range dependencies, while maintaining memory usage and computational efficiency during training. Additionally, the study replaces the standard multi-layer perceptron (MLP) in the Swin Transformer with a SwiGLU-based MLP, an upgraded version of the gated linear unit (GLU) module, to achieve higher accuracy, faster training speeds, and better parameter efficiency. The modified Swin model-base was evaluated using the publicly accessible ISIC 2019 skin dataset with eight classes and was compared against popular convolutional neural networks (CNNs) and cutting-edge vision transformer (ViT) models. In an exhaustive assessment on the unseen test dataset, the proposed Swin-Base model demonstrated exceptional performance, achieving an accuracy of 89.36%, a recall of 85.13%, a precision of 88.22%, and an F1-score of 86.65%, surpassing all previously reported research and deep learning models documented in the literature.
皮肤癌是全球最常见的癌症之一,早期检测对于有效治疗至关重要。皮肤科医生经常面临数据需求大、可能出现人为错误以及严格的时间限制等挑战,这些都会对诊断结果产生负面影响。基于深度学习的诊断系统提供快速、准确的检测以及增强的研究能力,为皮肤科医生提供了重要支持。在本研究中,我们通过实现基于混合移位窗口的多头自注意力机制(HSW-MSA)来替代传统的基于移位窗口的多头自注意力机制(SW-MSA),对Swin Transformer架构进行了改进。这种调整使模型能够更有效地处理皮肤癌重叠区域,捕捉更精细的细节,并管理长程依赖关系,同时在训练过程中保持内存使用和计算效率。此外,该研究用基于SwiGLU的多层感知器(MLP)取代了Swin Transformer中的标准多层感知器,SwiGLU是门控线性单元(GLU)模块的升级版,以实现更高的准确率、更快的训练速度和更好的参数效率。使用具有八个类别的公开可用的ISIC 2019皮肤数据集对改进后的Swin模型基础进行了评估,并与流行的卷积神经网络(CNN)和前沿的视觉Transformer(ViT)模型进行了比较。在对未见测试数据集的详尽评估中,所提出的Swin-Base模型表现出色,准确率达到89.36%,召回率为85.13%,精确率为88.22%,F1分数为86.65%,超过了文献中所有先前报道的研究和深度学习模型。