Telecom Bretagne, Laboratoire Traitement de l’Information Me´dicale, Brest, France.
Invest Ophthalmol Vis Sci. 2011 Oct 21;52(11):8342-8. doi: 10.1167/iovs.11-7418.
Recent studies on diabetic retinopathy (DR) screening in fundus photographs suggest that disagreements between algorithms and clinicians are now comparable to disagreements among clinicians. The purpose of this study is to (1) determine whether this observation also holds for automated DR severity assessment algorithms, and (2) show the interest of such algorithms in clinical practice.
A dataset of 85 consecutive DR examinations (168 eyes, 1176 multimodal eye fundus photographs) was collected at Brest University Hospital (Brest, France). Two clinicians with different experience levels determined DR severity in each eye, according to the International Clinical Diabetic Retinopathy Disease Severity (ICDRS) scale. Based on Cohen's kappa (κ) measurements, the performance of clinicians at assessing DR severity was compared to the performance of state-of-the-art content-based image retrieval (CBIR) algorithms from our group.
At assessing DR severity in each patient, intraobserver agreement was κ = 0.769 for the most experienced clinician. Interobserver agreement between clinicians was κ = 0.526. Interobserver agreement between the most experienced clinicians and the most advanced algorithm was κ = 0.592. Besides, the most advanced algorithm was often able to predict agreements and disagreements between clinicians.
Automated DR severity assessment algorithms, trained to imitate experienced clinicians, can be used to predict when young clinicians would agree or disagree with their more experienced fellow members. Such algorithms may thus be used in clinical practice to help validate or invalidate their diagnoses. CBIR algorithms, in particular, may also be used for pooling diagnostic knowledge among peers, with applications in training and coordination of clinicians' prescriptions.
最近对眼底照片中糖尿病视网膜病变(DR)筛查的研究表明,算法和临床医生之间的分歧现在与临床医生之间的分歧相当。本研究的目的是:(1)确定这一观察结果是否也适用于自动 DR 严重程度评估算法,以及(2)展示此类算法在临床实践中的意义。
在布雷斯特大学医院(法国布雷斯特)收集了 85 例连续的 DR 检查(168 只眼,1176 只多模态眼底照片)数据集。两名经验水平不同的临床医生根据国际临床糖尿病视网膜病变疾病严重程度(ICDRS)量表确定每只眼的 DR 严重程度。根据 Cohen 的 kappa(κ)测量值,比较了临床医生评估 DR 严重程度的表现与我们小组的最先进基于内容的图像检索(CBIR)算法的表现。
在评估每位患者的 DR 严重程度时,经验最丰富的临床医生的组内一致性为κ=0.769。两位临床医生之间的观察者间一致性为κ=0.526。最有经验的临床医生和最先进的算法之间的观察者间一致性为κ=0.592。此外,最先进的算法通常能够预测临床医生之间的一致性和分歧。
经过训练以模仿经验丰富的临床医生的自动 DR 严重程度评估算法可用于预测年轻临床医生何时会同意或不同意他们更有经验的同事的诊断。因此,此类算法可用于临床实践,以帮助验证或否定他们的诊断。特别是 CBIR 算法,也可用于在同行之间汇集诊断知识,应用于培训和协调临床医生的处方。