Shi Min, Afzal Muhammad Muneeb, Huang Hao, Wen Congcong, Luo Yan, Khan Muhammad Osama, Tian Yu, Kim Leo, Fang Yi, Wang Mengyu
Harvard Ophthalmology AI Lab, Schepens Eye Research Institute of Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA.
Tandon School of Engineering, New York University, New York, NY, USA.
Transl Vis Sci Technol. 2025 Jul 1;14(7):1. doi: 10.1167/tvst.14.7.1.
To investigate the fairness of existing deep models for diabetic retinopathy (DR) detection and introduce an equitable model to reduce group performance disparities.
We evaluated the performance and fairness of various deep learning models for DR detection using fundus images and optical coherence tomography (OCT) B-scans. A Fair Adaptive Scaling (FAS) module was developed to reduce group disparities. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), and equity across various groups was assessed by equity-scaled AUC, which accommodated both overall AUC and AUCs of individual groups.
Using color fundus images, the integration of FAS with EfficientNet improved the overall AUC and equity-scaled AUC from 0.88 and 0.83 to 0.90 and 0.84 (P < 0.05) by race. AUCs for Asians and Whites increased by 0.05 and 0.03, respectively (P < 0.01). For gender, both metrics improved by 0.01 (P < 0.05). Using DenseNet121 on OCT B-Scans by race, FAS improved the overall AUC and equity-scaled AUC from 0.875 and 0.81 to 0.884 and 0.82, with gains of 0.03 and 0.02 for Asians and Blacks (P < 0.01). For gender, DenseNet121's metrics rose by 0.04 and 0.03, with gains of 0.05 and 0.04 for females and males (P < 0.01).
Deep learning models demonstrate varying accuracies across different groups in DR detection. FAS improves equity and accuracy of deep learning models.
The proposed deep learning model has a potential to improve both model performance and equity of DR detection.
研究现有用于糖尿病视网膜病变(DR)检测的深度模型的公平性,并引入一个公平模型以减少组间性能差异。
我们使用眼底图像和光学相干断层扫描(OCT)B扫描评估了各种用于DR检测的深度学习模型的性能和公平性。开发了一个公平自适应缩放(FAS)模块以减少组间差异。使用受试者工作特征曲线下面积(AUC)评估模型性能,并通过公平缩放AUC评估不同组间的公平性,该指标兼顾了总体AUC和各个组的AUC。
使用彩色眼底图像时,FAS与EfficientNet相结合,按种族将总体AUC和公平缩放AUC分别从0.88和0.83提高到0.90和0.84(P<0.05)。亚洲人和白人的AUC分别增加了0.05和0.03(P<0.01)。按性别划分,两个指标均提高了0.01(P<0.05)。在使用OCT B扫描的种族数据上,采用DenseNet121时,FAS将总体AUC和公平缩放AUC分别从0.875和0.81提高到0.884和0.82,亚洲人和黑人的增益分别为0.03和0.02(P<0.01)。按性别划分,DenseNet121的指标分别提高了0.04和0.03,女性和男性的增益分别为0.05和0.04(P<0.01)。
深度学习模型在DR检测中对不同组的准确率有所不同。FAS提高了深度学习模型的公平性和准确性。
所提出的深度学习模型有可能提高DR检测的模型性能和公平性。