The College of Software, Xinjiang University, Urumqi, 830046, China.
The College of Information Science and Engineering, Xinjiang University, Urumqi, 830046, China.
J Cancer Res Clin Oncol. 2023 Nov;149(17):16075-16086. doi: 10.1007/s00432-023-05223-x. Epub 2023 Sep 12.
The application of deep learning methods to the intelligent diagnosis of diseases has been the focus of intelligent medical research. When dealing with image classification tasks, if the lesion area is small and uneven, the background image involved in the training will affect the ultimate accuracy in determining the extent of the lesion. We did not follow the traditional approach of building an intelligent system to assist physicians in diagnosis from the perspective of CNN models, but instead proposed a pure transformer framework that can be used for diagnostic grading of pathological images.
We propose a Symmetric Mask Pre-Training vision Transformer SMiT model for grading pathological cancer images. SMiT performs a symmetrically identical high probability sparsification of the input image token sequence at the first and last encoder layer positions to pre-train visual transformers, and the parameters of the baseline model are fine-tuned after loading the pre-training weights, allowing the model to concentrate more on extracting detailed features in the lesion region, effectively getting rid of the potential feature dependency problem.
SMiT achieved 92.8% classification accuracy on 4500 histopathological images of colorectal cancer processed by Gaussian filter denoising. We validated the effectiveness and generalizability of this study's methodology on the publicly available diabetic retinopathy dataset APTOS2019 from Kaggle and achieved quadratic Cohen Kappa, accuracy and F1-score of 91.9%, 86.91% and 72.85%, respectively, which were 1-2% better than previous studies based on CNN models.
SMiT uses a simpler strategy to achieve better results to assist physicians in making accurate clinical decisions, which can be an inspiration for making good use of the visual transformers in the field of medical imaging.
深度学习方法在疾病智能诊断中的应用一直是智能医学研究的焦点。在处理图像分类任务时,如果病变区域小且不均匀,训练中涉及的背景图像会影响最终确定病变程度的准确性。我们没有从 CNN 模型的角度遵循传统的方法来构建智能系统以协助医生进行诊断,而是提出了一种纯粹的变压器框架,可用于对病理图像进行诊断分级。
我们提出了一种用于分级病理癌症图像的对称掩模预训练视觉 Transformer SMiT 模型。SMiT 在第一个和最后一个编码器层位置对输入图像标记序列执行对称相同的高概率稀疏化,以预训练视觉 Transformer,并且在加载预训练权重后对基线模型的参数进行微调,使模型能够更专注于提取病变区域的详细特征,有效地摆脱潜在的特征依赖问题。
SMiT 在经过高斯滤波降噪处理的 4500 张结直肠癌组织病理学图像上实现了 92.8%的分类准确率。我们在 Kaggle 上提供的公开可用的糖尿病视网膜病变数据集 APTOS2019 上验证了这项研究方法的有效性和通用性,分别达到了 91.9%、86.91%和 72.85%的二次科恩 Kappa、准确率和 F1 得分,比以前基于 CNN 模型的研究高出 1-2%。
SMiT 采用更简单的策略来实现更好的结果,以协助医生做出准确的临床决策,这可为充分利用医学影像领域的视觉 Transformer 提供启示。