Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, OR, USA.
Electrical and Computer System Engineering, Faculty of Engineering, Monash University, Clayton, Victoria, Australia.
Transl Vis Sci Technol. 2023 Aug 1;12(8):6. doi: 10.1167/tvst.12.8.6.
The presence of imbalanced datasets in medical applications can negatively affect deep learning methods. This study aims to investigate how the performance of convolutional neural networks (CNNs) for glaucoma diagnosis can be improved by addressing imbalanced learning issues through utilizing glaucoma suspect samples, which are often excluded from studies because they are a mixture of healthy and preperimetric glaucomatous eyes, in a semi-supervised learning approach.
A baseline 3D CNN was developed and trained on a real-world glaucoma dataset, which is naturally imbalanced (like many other real-world medical datasets). Then, three methods, including reweighting samples, data resampling to form balanced batches, and semi-supervised learning on glaucoma suspect data were applied to practically assess their impacts on the performances of the trained methods.
The proposed method achieved a mean accuracy of 95.24%, an F1 score of 97.42%, and an area under the curve of receiver operating characteristic (AUC ROC) of 95.64%, whereas the corresponding results for the traditional supervised training using weighted cross-entropy loss were 92.88%, 96.12%, and 92.72%, respectively. The obtained results show statistically significant improvements in all metrics.
Exploiting glaucoma suspect eyes in a semi-supervised learning method coupled with resampling can improve glaucoma diagnosis performance by mitigating imbalanced learning issues.
Clinical imbalanced datasets may negatively affect medical applications of deep learning. Utilizing data with uncertain diagnosis, such as glaucoma suspects, through a combination of semi-supervised learning and class-imbalanced learning strategies can partially address the problems of having limited data and learning on imbalanced datasets.
医学应用中存在不平衡数据集可能会对深度学习方法产生负面影响。本研究旨在探讨通过利用半监督学习方法处理不平衡学习问题,利用通常因包含健康和前期青光眼眼而被排除在研究之外的青光眼疑似样本,如何提高用于青光眼诊断的卷积神经网络(CNN)的性能。
开发了一个基线 3D CNN,并在一个真实的青光眼数据集上进行训练,该数据集自然是不平衡的(像许多其他真实世界的医学数据集一样)。然后,应用了三种方法,包括对样本进行重新加权、对数据进行重新采样以形成平衡批次,以及对青光眼疑似数据进行半监督学习,以实际评估它们对训练方法性能的影响。
所提出的方法实现了平均准确率为 95.24%、F1 得分为 97.42%和接收者操作特征曲线下面积(AUC ROC)为 95.64%,而传统使用加权交叉熵损失的监督训练的相应结果分别为 92.88%、96.12%和 92.72%。所得结果表明所有指标均有显著提高。
通过半监督学习方法和重新采样利用青光眼疑似眼可以减轻不平衡学习问题,从而提高青光眼诊断性能。
马志浩