IEEE J Biomed Health Inform. 2022 Jul;26(7):2983-2994. doi: 10.1109/JBHI.2022.3162748. Epub 2022 Jul 1.
In manyclinical settings, a lot of medical image datasets suffer from imbalance problems, which makes predictions of trained models to be biased toward majority classes. Semi-supervised Learning (SSL) algorithms trained with such imbalanced datasets become more problematic since pseudo-supervision of unlabeled data are generated from the model's biased predictions. To address these issues, in this work, we propose a novel semi-supervised deep learning method, i.e., uncertainty-guided virtual adversarial training (VAT) with batch nuclear-norm (BNN) optimization, for large-scale medical image classification. To effectively exploit useful information from both labeled and unlabeled data, we leverage VAT and BNN optimization to harness the underlying knowledge, which helps to improve discriminability, diversity and generalization of the trained models. More concretely, our network is trained by minimizing a combination of four types of losses, including a supervised cross-entropy loss, a BNN loss defined on the output matrix of labeled data batch (lBNN loss), a negative BNN loss defined on the output matrix of unlabeled data batch (uBNN loss), and a VAT loss on both labeled and unlabeled data. We additionally propose to use uncertainty estimation to filter out unlabeled samples near the decision boundary when computing the VAT loss. We conduct comprehensive experiments to evaluate the performance of our method on two publicly available datasets and one in-house collected dataset. The experimental results demonstrated that our method achieved better results than state-of-the-art SSL methods.
在许多临床环境中,大量的医学图像数据集都存在不平衡问题,这使得训练模型的预测偏向于多数类。使用此类不平衡数据集训练的半监督学习 (SSL) 算法变得更加成问题,因为未标记数据的伪监督是由模型的有偏差预测生成的。为了解决这些问题,在这项工作中,我们提出了一种新颖的半监督深度学习方法,即不确定性引导的虚拟对抗训练 (VAT) 与批量核范数 (BNN) 优化,用于大规模医学图像分类。为了有效地利用有标记和无标记数据中的有用信息,我们利用 VAT 和 BNN 优化来利用潜在知识,这有助于提高训练模型的可区分性、多样性和泛化能力。更具体地说,我们的网络通过最小化四种类型的损失的组合来训练,包括有监督的交叉熵损失、定义在有标记数据批的输出矩阵上的 BNN 损失 (lBNN 损失)、定义在无标记数据批的输出矩阵上的负 BNN 损失 (uBNN 损失),以及有标记和无标记数据上的 VAT 损失。我们还提出使用不确定性估计来过滤掉计算 VAT 损失时靠近决策边界的无标记样本。我们在两个公开可用的数据集和一个内部收集的数据集上进行了全面的实验,以评估我们的方法的性能。实验结果表明,我们的方法比最先进的 SSL 方法取得了更好的结果。