Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China; College of Computer and Data Science, Fuzhou University, Fuzhou, China.
Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China; College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou, China.
Comput Biol Med. 2023 Sep;164:107280. doi: 10.1016/j.compbiomed.2023.107280. Epub 2023 Jul 22.
Despite the success of deep neural networks in medical image classification, the problem remains challenging as data annotation is time-consuming, and the class distribution is imbalanced due to the relative scarcity of diseases. To address this problem, we propose Class-Specific Distribution Alignment (CSDA), a semi-supervised learning framework based on self-training that is suitable to learn from highly imbalanced datasets. Specifically, we first provide a new perspective to distribution alignment by considering the process as a change of basis in the vector space spanned by marginal predictions, and then derive CSDA to capture class-dependent marginal predictions on both labeled and unlabeled data, in order to avoid the bias towards majority classes. Furthermore, we propose a Variable Condition Queue (VCQ) module to maintain a proportionately balanced number of unlabeled samples for each class. Experiments on three public datasets HAM10000, CheXpert and Kvasir show that our method provides competitive performance on semi-supervised skin disease, thoracic disease, and endoscopic image classification tasks.
尽管深度神经网络在医学图像分类方面取得了成功,但由于数据标注耗时且疾病相对较少,导致类别分布不平衡,因此该问题仍然具有挑战性。为了解决这个问题,我们提出了基于自训练的半监督学习框架 Class-Specific Distribution Alignment (CSDA),适用于从高度不平衡的数据集学习。具体来说,我们首先通过考虑该过程是由边缘预测所张成的向量空间中的基变换,从而提供了一种新的分布对齐视角,然后推导出 CSDA 以捕获有标签和无标签数据上的与类别相关的边缘预测,以避免偏向多数类别的情况。此外,我们提出了一个 Variable Condition Queue (VCQ) 模块,以保持每个类的无标签样本数量的均衡。在 HAM10000、CheXpert 和 Kvasir 三个公共数据集上的实验表明,我们的方法在半监督皮肤病、胸部疾病和内窥镜图像分类任务中提供了具有竞争力的性能。