Lin Mingquan, Xiao Yunyu, Hou Bojian, Wanyan Tingyi, Sharma Mohit Manoj, Wang Zhangyang, Wang Fei, Tassel Sarah Van, Peng Yifan
Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania.
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:370-377. eCollection 2023.
In the United States, primary open-angle glaucoma (POAG) is the leading cause of blindness, especially among African American and Hispanic individuals. Deep learning has been widely used to detect POAG using fundus images as its performance is comparable to or even surpasses diagnosis by clinicians. However, human bias in clinical diagnosis may be reflected and amplified in the widely-used deep learning models, thus impacting their performance. Biases may cause (1) underdiagnosis, increasing the risks of delayed or inadequate treatment, and (2) overdiagnosis, which may increase individuals' stress, fear, well-being, and unnecessary/costly treatment. In this study, we examined the underdiagnosis and overdiagnosis when applying deep learning in POAG detection based on the Ocular Hypertension Treatment Study (OHTS) from 22 centers across 16 states in the United States. Our results show that the widely-used deep learning model can underdiagnose or overdiagnose under-served populations. The most underdiagnosed group is female younger (< 60 yrs) group, and the most overdiagnosed group is Black older (≥ 60 yrs) group. Biased diagnosis through traditional deep learning methods may delay disease detection, treatment and create burdens among under-served populations, thereby, raising ethical concerns about using deep learning models in ophthalmology clinics.
在美国,原发性开角型青光眼(POAG)是导致失明的主要原因,尤其是在非裔美国人和西班牙裔人群中。深度学习已被广泛用于通过眼底图像检测POAG,因为其性能与临床医生的诊断相当,甚至超过临床医生的诊断。然而,临床诊断中的人为偏差可能会在广泛使用的深度学习模型中得到反映和放大,从而影响其性能。偏差可能导致(1)漏诊,增加延迟治疗或治疗不足的风险,以及(2)过度诊断,这可能会增加个人的压力、恐惧、健康问题以及不必要的/昂贵的治疗。在本研究中,我们基于美国16个州22个中心的眼压治疗研究(OHTS),研究了在POAG检测中应用深度学习时的漏诊和过度诊断情况。我们的结果表明,广泛使用的深度学习模型可能会对服务不足的人群进行漏诊或过度诊断。漏诊最多的群体是年轻女性(<60岁)群体,过度诊断最多的群体是老年黑人(≥60岁)群体。通过传统深度学习方法进行的有偏差诊断可能会延迟疾病检测、治疗,并给服务不足的人群带来负担,从而引发在眼科诊所使用深度学习模型的伦理问题。