Ayana Gelan, Wako Beshatu Debela, Park So-Yun, Kong Jude, Han Sahng Min, Yoon Soon-Do, Choe Se-Woon
Department of Biomedical Engineering, Kumoh National Institute of Technology, Gumi 39253, Republic of Korea.
School of Biomedical Engineering, Jimma Institute of Technology, Jimma University, Jimma 378, Ethiopia.
Diagnostics (Basel). 2025 Jul 3;15(13):1698. doi: 10.3390/diagnostics15131698.
The global spread of Monkeypox (Mpox) has highlighted the urgent need for rapid, accurate diagnostic tools. Traditional methods like polymerase chain reaction (PCR) are resource-intensive, while skin image-based detection offers a promising alternative. This study evaluates the effectiveness of vision transformers (ViTs) for automated Mpox detection. By fine-tuning a pre-trained ViT model on an Mpox lesion image dataset, a robust ViT-based transfer learning (TL) model was created. Performance was assessed relative to convolutional neural network (CNN)-based TL models and ViT models trained from scratch across key metrics: accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Furthermore, a transferability measure was utilized to assess the effectiveness of feature transfer to Mpox images. The results show that the ViT model outperformed a CNN, achieving an AUC of 0.948 and an accuracy of 0.942 with a -value of less than 0.05 across all metrics, highlighting its potential for accurate and scalable Mpox detection. Moreover, the ViT models yielded a better hypothesis margin-based transferability measure, highlighting its effectiveness in transferring useful learning weights to Mpox images. Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations also confirmed that the ViT model attends to clinically relevant features, supporting its interpretability and reliability for diagnostic use. The results from this study suggest that ViT offers superior accuracy, making it a valuable tool for Mpox early detection in field settings, especially where conventional diagnostics are limited. This approach could support faster outbreak response and improved resource allocation in public health systems.
猴痘(Mpox)的全球传播凸显了对快速、准确诊断工具的迫切需求。像聚合酶链反应(PCR)这样的传统方法资源消耗大,而基于皮肤图像的检测提供了一种有前景的替代方案。本研究评估了视觉Transformer(ViT)用于自动检测猴痘的有效性。通过在猴痘病变图像数据集上微调预训练的ViT模型,创建了一个强大的基于ViT的迁移学习(TL)模型。相对于基于卷积神经网络(CNN)的TL模型和从头开始训练的ViT模型,在关键指标(准确性、精确率、召回率、F1分数和受试者工作特征曲线下面积(AUC))方面评估了性能。此外,利用一种可迁移性度量来评估特征向猴痘图像转移的有效性。结果表明,ViT模型优于CNN,在所有指标上AUC达到0.948,准确率达到0.942,p值小于0.05,突出了其在准确和可扩展的猴痘检测方面的潜力。此外,ViT模型产生了基于假设边际的更好的可迁移性度量,突出了其在将有用的学习权重转移到猴痘图像方面的有效性。梯度加权类激活映射(Grad-CAM)可视化也证实了ViT模型关注临床相关特征,支持其在诊断应用中的可解释性和可靠性。本研究结果表明,ViT具有更高的准确性,使其成为现场环境中猴痘早期检测的有价值工具,特别是在传统诊断有限的情况下。这种方法可以支持在公共卫生系统中更快地应对疫情爆发并改善资源分配。