Khanal Bidur, Shrestha Prashant, Amgain Sanskar, Khanal Bishesh, Bhattarai Binod, Linte Cristian A
Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-6. doi: 10.1109/EMBC53108.2024.10782929.
Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets-COVID-DU-Ex, and NCT-CRC-HE-100K-both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training.
医学图像分类数据集中的标签噪声严重阻碍了监督深度学习方法的训练,削弱了它们的通用性。随着标签噪声率的增加,模型的测试性能往往会下降。近年来,已经提出了几种方法来减轻标签噪声在医学图像分类中的影响,并提高模型的鲁棒性。主要地,这些工作采用基于卷积神经网络(CNN)的架构作为其分类器的主干进行特征提取。然而,近年来,基于视觉Transformer(ViT)的主干已经取代了CNN,表现出了更好的性能和更强的学习更通用特征的能力,特别是当数据集很大时。尽管如此,之前没有工作严格研究基于Transformer的主干如何处理医学图像分类中标签噪声的影响。在本文中,我们研究了ViT对标签噪声的架构鲁棒性,并将其与CNN的鲁棒性进行比较。我们使用了两个医学图像分类数据集——COVID-DU-Ex和NCT-CRC-HE-100K——这两个数据集都通过以不同速率注入标签噪声而被破坏。此外,我们表明预训练对于确保ViT在监督训练中对标签噪声具有更高的鲁棒性至关重要。