评估糖尿病视网膜病变机器学习模型的分级者变异性和参考标准的重要性。

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.

机构信息

Google Research, Google Inc., Mountain View, California.

Department of Ophthalmology, Palo Alto Medical Foundation, Palo Alto, California.

出版信息

Ophthalmology. 2018 Aug;125(8):1264-1272. doi: 10.1016/j.ophtha.2018.01.034. Epub 2018 Mar 13.

DOI:10.1016/j.ophtha.2018.01.034

PMID:29548646

Abstract

PURPOSE

Use adjudication to quantify errors in diabetic retinopathy (DR) grading based on individual graders and majority decision, and to train an improved automated algorithm for DR grading.

DESIGN

Retrospective analysis.

PARTICIPANTS

Retinal fundus images from DR screening programs.

METHODS

Images were each graded by the algorithm, U.S. board-certified ophthalmologists, and retinal specialists. The adjudicated consensus of the retinal specialists served as the reference standard.

MAIN OUTCOME MEASURES

For agreement between different graders as well as between the graders and the algorithm, we measured the (quadratic-weighted) kappa score. To compare the performance of different forms of manual grading and the algorithm for various DR severity cutoffs (e.g., mild or worse DR, moderate or worse DR), we measured area under the curve (AUC), sensitivity, and specificity.

RESULTS

Of the 193 discrepancies between adjudication by retinal specialists and majority decision of ophthalmologists, the most common were missing microaneurysm (MAs) (36%), artifacts (20%), and misclassified hemorrhages (16%). Relative to the reference standard, the kappa for individual retinal specialists, ophthalmologists, and algorithm ranged from 0.82 to 0.91, 0.80 to 0.84, and 0.84, respectively. For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981. The algorithm had a sensitivity of 0.971, specificity of 0.923, and AUC of 0.986. For mild or worse DR, the algorithm had a sensitivity of 0.970, specificity of 0.917, and AUC of 0.986. By using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, the algorithm improved in AUC from 0.934 to 0.986 for moderate or worse DR.

CONCLUSIONS

Adjudication reduces the errors in DR grading. A small set of adjudicated DR grades allows substantial improvements in algorithm performance. The resulting algorithm's performance was on par with that of individual U.S. Board-Certified ophthalmologists and retinal specialists.

摘要

目的

利用鉴定来量化基于个体分级员和多数决策的糖尿病视网膜病变（DR）分级中的错误，并训练一种改进的 DR 分级自动化算法。

设计

回顾性分析。

参与者

DR 筛查计划的视网膜眼底图像。

方法

图像分别由算法、美国董事会认证的眼科医生和视网膜专家进行分级。视网膜专家的鉴定共识作为参考标准。

主要观察指标

为了评估不同分级员之间以及分级员和算法之间的一致性，我们测量了（二次加权）kappa 评分。为了比较不同形式的手动分级和算法在各种 DR 严重程度截止值（例如，轻度或更严重的 DR，中度或更严重的 DR）下的性能，我们测量了曲线下面积（AUC）、敏感性和特异性。

结果

在视网膜专家的鉴定与眼科医生多数决策之间的 193 个差异中，最常见的是微动脉瘤（MA）缺失（36%）、伪影（20%）和出血错误分类（16%）。与参考标准相比，个体视网膜专家、眼科医生和算法的 kappa 值分别为 0.82 至 0.91、0.80 至 0.84 和 0.84。对于中度或更严重的 DR，眼科医生的多数决策的敏感性为 0.838，特异性为 0.981。该算法的敏感性为 0.971，特异性为 0.923，AUC 为 0.986。对于轻度或更严重的 DR，该算法的敏感性为 0.970，特异性为 0.917，AUC 为 0.986。通过使用少量鉴定共识等级作为调整数据集并使用更高分辨率的图像作为输入，该算法在中度或更严重的 DR 中的 AUC 从 0.934 提高到 0.986。

结论

鉴定可减少 DR 分级中的错误。少量的鉴定 DR 等级可大大提高算法的性能。由此产生的算法的性能与美国董事会认证的眼科医生和视网膜专家相当。

相似文献

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.评估糖尿病视网膜病变机器学习模型的分级者变异性和参考标准的重要性。

Ophthalmology. 2018 Aug;125(8):1264-1272. doi: 10.1016/j.ophtha.2018.01.034. Epub 2018 Mar 13.

Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy.利用深度学习算法和集成梯度解释辅助糖尿病视网膜病变分级。

Ophthalmology. 2019 Apr;126(4):552-564. doi: 10.1016/j.ophtha.2018.11.016. Epub 2018 Dec 13.

Automated Identification of Diabetic Retinopathy Using Deep Learning.基于深度学习的糖尿病视网膜病变自动识别。

Ophthalmology. 2017 Jul;124(7):962-969. doi: 10.1016/j.ophtha.2017.02.008. Epub 2017 Mar 27.

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.深度学习算法在视网膜眼底照片糖尿病视网膜病变检测中的开发与验证。

JAMA. 2016 Dec 13;316(22):2402-2410. doi: 10.1001/jama.2016.17216.

Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning.通过深度学习整合在公开可用数据集上改进糖尿病视网膜病变的自动检测

Invest Ophthalmol Vis Sci. 2016 Oct 1;57(13):5200-5206. doi: 10.1167/iovs.16-19964.

Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study.金标准标签错误对评估深度学习模型在糖尿病视网膜病变筛查中的性能的影响：全国真实世界验证研究。

J Med Internet Res. 2024 Aug 14;26:e52506. doi: 10.2196/52506.

Validation of Deep Convolutional Neural Network-based algorithm for detection of diabetic retinopathy - Artificial intelligence versus clinician for screening.基于深度卷积神经网络的糖尿病视网膜病变检测算法的验证 - 人工智能与临床医生用于筛查的比较。

Indian J Ophthalmol. 2020 Feb;68(2):398-405. doi: 10.4103/ijo.IJO_966_19.

An Automated Grading System for Detection of Vision-Threatening Referable Diabetic Retinopathy on the Basis of Color Fundus Photographs.基于彩色眼底照片的威胁视力可转诊糖尿病视网膜病变自动分级系统。

Diabetes Care. 2018 Dec;41(12):2509-2516. doi: 10.2337/dc18-0147. Epub 2018 Oct 1.

Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes.使用来自多民族糖尿病患者群体的视网膜图像开发并验证用于糖尿病视网膜病变及相关眼病的深度学习系统

JAMA. 2017 Dec 12;318(22):2211-2223. doi: 10.1001/jama.2017.18152.

Deep Learning-Based Algorithms in Screening of Diabetic Retinopathy: A Systematic Review of Diagnostic Performance.基于深度学习的算法在糖尿病视网膜病变筛查中的应用：诊断性能的系统评价

Ophthalmol Retina. 2019 Apr;3(4):294-304. doi: 10.1016/j.oret.2018.10.014. Epub 2018 Nov 3.

引用本文的文献

Machine Learning to Diagnose Complications of Diabetes.用于诊断糖尿病并发症的机器学习

J Diabetes Sci Technol. 2025 Sep 11:19322968251365245. doi: 10.1177/19322968251365245.

Training a high-performance retinal foundation model with half-the-data and 400 times less compute.使用一半的数据和少400倍的计算量训练一个高性能视网膜基础模型。

Nat Commun. 2025 Jul 25;16(1):6862. doi: 10.1038/s41467-025-62123-z.

Improving diabetic retinopathy screening using artificial intelligence: design, evaluation and before-and-after study of a custom development.利用人工智能改善糖尿病视网膜病变筛查：定制开发的设计、评估及前后对照研究

Front Digit Health. 2025 Jun 19;7:1547045. doi: 10.3389/fdgth.2025.1547045. eCollection 2025.

A scoping review of artificial intelligence as a medical device for ophthalmic image analysis in Europe, Australia and America.欧洲、澳大利亚和美洲将人工智能作为眼科图像分析医疗设备的范围综述。

NPJ Digit Med. 2025 May 29;8(1):323. doi: 10.1038/s41746-025-01726-8.

The role of artificial intelligence in the diagnosis of diabetic retinopathy through retinal lesion features: a narrative review.通过视网膜病变特征探讨人工智能在糖尿病视网膜病变诊断中的作用：一项叙述性综述

Quant Imaging Med Surg. 2025 May 1;15(5):4816-4846. doi: 10.21037/qims-24-1791. Epub 2025 Apr 16.

Examining the Visual Search Behaviour of Experts When Screening for the Presence of Diabetic Retinopathy in Fundus Images.在眼底图像中筛查糖尿病视网膜病变时检查专家的视觉搜索行为。

J Clin Med. 2025 Apr 28;14(9):3046. doi: 10.3390/jcm14093046.

Foveal Hypoplasia Grading with Optical Coherence Tomography: Agreement and Challenges Across Experience Levels.基于光学相干断层扫描的黄斑发育不全分级：不同经验水平间的一致性与挑战

Diagnostics (Basel). 2025 Mar 18;15(6):763. doi: 10.3390/diagnostics15060763.

Validation of a Deep Learning Model for Diabetic Retinopathy on Patients with Young-Onset Diabetes.针对早发型糖尿病患者的糖尿病视网膜病变深度学习模型的验证

Ophthalmol Ther. 2025 May;14(5):1147-1155. doi: 10.1007/s40123-025-01116-z. Epub 2025 Mar 14.

Detecting Glaucoma in Highly Myopic Eyes From Fundus Photographs Using Deep Convolutional Neural Networks.使用深度卷积神经网络从眼底照片中检测高度近视眼中的青光眼

Clin Exp Ophthalmol. 2025 Jul;53(5):502-515. doi: 10.1111/ceo.14498. Epub 2025 Feb 9.

Color fundus photograph-based diabetic retinopathy grading via label relaxed collaborative learning on deep features and radiomics features.基于彩色眼底照片的糖尿病视网膜病变分级：通过对深度特征和影像组学特征进行标签松弛协同学习实现

Front Cell Dev Biol. 2025 Jan 9;12:1513971. doi: 10.3389/fcell.2024.1513971. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估糖尿病视网膜病变机器学习模型的分级者变异性和参考标准的重要性。

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.

机构信息

出版信息

PURPOSE

DESIGN

PARTICIPANTS

METHODS

MAIN OUTCOME MEASURES

RESULTS

CONCLUSIONS

目的

设计

参与者

方法

主要观察指标

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献