School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, Hubei, China.
PLoS One. 2024 Jul 2;19(7):e0298102. doi: 10.1371/journal.pone.0298102. eCollection 2024.
Brain tumors pose a significant threat to health, and their early detection and classification are crucial. Currently, the diagnosis heavily relies on pathologists conducting time-consuming morphological examinations of brain images, leading to subjective outcomes and potential misdiagnoses. In response to these challenges, this study proposes an improved Vision Transformer-based algorithm for human brain tumor classification. To overcome the limitations of small existing datasets, Homomorphic Filtering, Channels Contrast Limited Adaptive Histogram Equalization, and Unsharp Masking techniques are applied to enrich dataset images, enhancing information and improving model generalization. Addressing the limitation of the Vision Transformer's self-attention structure in capturing input token sequences, a novel relative position encoding method is employed to enhance the overall predictive capabilities of the model. Furthermore, the introduction of residual structures in the Multi-Layer Perceptron tackles convergence degradation during training, leading to faster convergence and enhanced algorithm accuracy. Finally, this study comprehensively analyzes the network model's performance on validation sets in terms of accuracy, precision, and recall. Experimental results demonstrate that the proposed model achieves a classification accuracy of 91.36% on an augmented open-source brain tumor dataset, surpassing the original VIT-B/16 accuracy by 5.54%. This validates the effectiveness of the proposed approach in brain tumor classification, offering potential reference for clinical diagnoses by medical practitioners.
脑肿瘤对健康构成重大威胁,早期发现和分类至关重要。目前,诊断主要依赖病理学家对脑图像进行耗时的形态学检查,导致结果主观且存在潜在误诊。针对这些挑战,本研究提出了一种改进的基于 Vision Transformer 的人脑肿瘤分类算法。为了克服现有小数据集的局限性,应用同态滤波、通道对比度受限自适应直方图均衡化和非锐化掩模技术来丰富数据集图像,增强信息并提高模型泛化能力。针对 Vision Transformer 自注意力结构在捕获输入令牌序列方面的局限性,采用了一种新颖的相对位置编码方法来增强模型的整体预测能力。此外,在多层感知机中引入残差结构解决了训练过程中的收敛退化问题,实现更快的收敛和更高的算法精度。最后,本研究全面分析了网络模型在验证集上的性能,包括准确性、精度和召回率。实验结果表明,该模型在增强的开源脑肿瘤数据集上的分类准确率达到 91.36%,超过了原始 VIT-B/16 的准确率 5.54%。这验证了该方法在脑肿瘤分类中的有效性,为临床医生的诊断提供了潜在的参考。