Applied Physics Laboratory, The Johns Hopkins University, Baltimore, Maryland.
Wilmer Eye Institute, Retina Division, The Johns Hopkins University School of Medicine, Baltimore, Maryland.
JAMA Ophthalmol. 2020 Oct 1;138(10):1070-1077. doi: 10.1001/jamaophthalmol.2020.3269.
Recent studies have demonstrated the successful application of artificial intelligence (AI) for automated retinal disease diagnostics but have not addressed a fundamental challenge for deep learning systems: the current need for large, criterion standard-annotated retinal data sets for training. Low-shot learning algorithms, aiming to learn from a relatively low number of training data, may be beneficial for clinical situations involving rare retinal diseases or when addressing potential bias resulting from data that may not adequately represent certain groups for training, such as individuals older than 85 years.
To evaluate whether low-shot deep learning methods are beneficial when using small training data sets for automated retinal diagnostics.
DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study, conducted from July 1, 2019, to June 21, 2020, compared different diabetic retinopathy classification algorithms, traditional and low-shot, for 2-class designations (diabetic retinopathy warranting referral vs not warranting referral). The public domain EyePACS data set was used, which originally included 88 692 fundi from 44 346 individuals. Statistical analysis was performed from February 1 to June 21, 2020.
The performance (95% CIs) of the various AI algorithms was measured via receiver operating curves and their area under the curve (AUC), precision recall curves, accuracy, and F1 score, evaluated for different training data sizes, ranging from 5120 to 10 samples per class.
Deep learning algorithms, when trained with sufficiently large data sets (5120 samples per class), yielded comparable performance, with an AUC of 0.8330 (95% CI, 0.8140-0.8520) for a traditional approach (eg, fined-tuned ResNet), compared with low-shot methods (AUC, 0.8348 [95% CI, 0.8159-0.8537]) (using self-supervised Deep InfoMax [our method denoted as DIM]). However, when far fewer training images were available (n = 160), the traditional deep learning approach had an AUC decreasing to 0.6585 (95% CI, 0.6332-0.6838) and was outperformed by a low-shot method using self-supervision with an AUC of 0.7467 (95% CI, 0.7239-0.7695). At very low shots (n = 10), the traditional approach had performance close to chance, with an AUC of 0.5178 (95% CI, 0.4909-0.5447) compared with the best low-shot method (AUC, 0.5778 [95% CI, 0.5512-0.6044]).
These findings suggest the potential benefits of using low-shot methods for AI retinal diagnostics when a limited number of annotated training retinal images are available (eg, with rare ophthalmic diseases or when addressing potential AI bias).
最近的研究表明,人工智能(AI)在自动视网膜疾病诊断方面的应用取得了成功,但尚未解决深度学习系统的一个基本挑战:目前需要用于训练的大型、有标准标注的视网膜数据集。旨在从相对较少的训练数据中学习的少镜头学习算法可能有益于涉及罕见视网膜疾病的临床情况,或者在解决可能由于数据而导致的潜在偏差时有用,例如数据可能无法充分代表某些群体进行培训,例如年龄超过 85 岁的人。
评估在使用小的训练数据集进行自动视网膜诊断时,低镜头深度学习方法是否有益。
设计、设置和参与者:这项横断面研究于 2019 年 7 月 1 日至 2020 年 6 月 21 日进行,比较了不同的糖尿病视网膜病变分类算法,传统方法和低镜头方法,用于 2 类指定(需要转诊的糖尿病视网膜病变与不需要转诊的糖尿病视网膜病变)。使用公共领域的 EyePACS 数据集,该数据集最初包含来自 44346 个人的 88692 张眼底照片。统计分析于 2020 年 2 月 1 日至 6 月 21 日进行。
通过接收者操作曲线及其曲线下面积(AUC)、精度召回曲线、准确性和 F1 分数,评估了各种 AI 算法的性能(95%CI),评估了不同的训练数据大小,范围从 5120 到每个类 10 个样本。
当使用足够大的数据集(每个类 5120 个样本)进行训练时,深度学习算法的性能相当,传统方法(例如微调 ResNet)的 AUC 为 0.8330(95%CI,0.8140-0.8520),而低镜头方法(AUC,0.8348 [95%CI,0.8159-0.8537])(使用自我监督的 Deep InfoMax [我们的方法表示为 DIM])。然而,当可用于训练的图像数量很少(n=160)时,传统的深度学习方法的 AUC 下降到 0.6585(95%CI,0.6332-0.6838),并且表现不如使用自我监督的低镜头方法,AUC 为 0.7467(95%CI,0.7239-0.7695)。在极低的镜头(n=10)下,传统方法的性能接近机会,AUC 为 0.5178(95%CI,0.4909-0.5447),而最佳低镜头方法(AUC,0.5778 [95%CI,0.5512-0.6044])。
这些发现表明,当可用的带注释的训练视网膜图像数量有限时(例如,患有罕见眼病或解决潜在的 AI 偏差时),使用低镜头方法进行 AI 视网膜诊断可能具有潜在的好处。