Tohye Tewodros Gizaw, Qin Zhiguang, Al-Antari Mugahed A, Ukwuoma Chiagoziem C, Lonseko Zenebe Markos, Gu Yeong Hyeon
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China.
Department of Artificial Intelligence and Data Science, College of AI Convergence, Daeyang AI Center, Sejong University, Seoul 05006, Republic of Korea.
Bioengineering (Basel). 2024 Aug 31;11(9):887. doi: 10.3390/bioengineering11090887.
Glaucoma, a predominant cause of visual impairment on a global scale, poses notable challenges in diagnosis owing to its initially asymptomatic presentation. Early identification is vital to prevent irreversible vision impairment. Cutting-edge deep learning techniques, such as vision transformers (ViTs), have been employed to tackle the challenge of early glaucoma detection. Nevertheless, limited approaches have been suggested to improve glaucoma classification due to issues like inadequate training data, variations in feature distribution, and the overall quality of samples. Furthermore, fundus images display significant similarities and slight discrepancies in lesion sizes, complicating glaucoma classification when utilizing ViTs. To address these obstacles, we introduce the contour-guided and augmented vision transformer (CA-ViT) for enhanced glaucoma classification using fundus images. We employ a Conditional Variational Generative Adversarial Network (CVGAN) to enhance and diversify the training dataset by incorporating conditional sample generation and reconstruction. Subsequently, a contour-guided approach is integrated to offer crucial insights into the disease, particularly concerning the optic disc and optic cup regions. Both the original images and extracted contours are given to the ViT backbone; then, feature alignment is performed with a weighted cross-entropy loss. Finally, in the inference phase, the ViT backbone, trained on the original fundus images and augmented data, is used for multi-class glaucoma categorization. By utilizing the Standardized Multi-Channel Dataset for Glaucoma (SMDG), which encompasses various datasets (e.g., EYEPACS, DRISHTI-GS, RIM-ONE, REFUGE), we conducted thorough testing. The results indicate that the proposed CA-ViT model significantly outperforms current methods, achieving a precision of 93.0%, a recall of 93.08%, an F1 score of 92.9%, and an accuracy of 93.0%. Therefore, the integration of augmentation with the CVGAN and contour guidance can effectively enhance glaucoma classification tasks.
青光眼是全球视力损害的主要原因,因其最初无症状表现,在诊断方面面临显著挑战。早期识别对于预防不可逆的视力损害至关重要。前沿的深度学习技术,如视觉Transformer(ViT),已被用于应对早期青光眼检测的挑战。然而,由于训练数据不足、特征分布变化和样本整体质量等问题,改善青光眼分类的方法有限。此外,眼底图像在病变大小上显示出显著的相似性和细微差异,这使得在使用ViT进行青光眼分类时变得复杂。为了克服这些障碍,我们引入了轮廓引导增强视觉Transformer(CA-ViT),用于使用眼底图像增强青光眼分类。我们采用条件变分生成对抗网络(CVGAN),通过纳入条件样本生成和重建来增强和多样化训练数据集。随后,集成了一种轮廓引导方法,以提供对该疾病的关键见解,特别是关于视盘和视杯区域。原始图像和提取的轮廓都被输入到ViT主干中;然后,使用加权交叉熵损失进行特征对齐。最后,在推理阶段,在原始眼底图像和增强数据上训练的ViT主干用于多类青光眼分类。通过使用涵盖各种数据集(如EYEPACS、DRISHTI-GS、RIM-ONE、REFUGE)的青光眼标准化多通道数据集(SMDG),我们进行了全面测试。结果表明,所提出的CA-ViT模型显著优于当前方法,精度达到93.0%,召回率达到93.08%,F1分数达到92.9%,准确率达到93.0%。因此,将增强与CVGAN和轮廓引导相结合可以有效地增强青光眼分类任务。