Google Health, London, United Kingdom.
National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.
JAMA Ophthalmol. 2021 Sep 1;139(9):964-973. doi: 10.1001/jamaophthalmol.2021.2273.
Quantitative volumetric measures of retinal disease in optical coherence tomography (OCT) scans are infeasible to perform owing to the time required for manual grading. Expert-level deep learning systems for automatic OCT segmentation have recently been developed. However, the potential clinical applicability of these systems is largely unknown.
To evaluate a deep learning model for whole-volume segmentation of 4 clinically important pathological features and assess clinical applicability.
DESIGN, SETTING, PARTICIPANTS: This diagnostic study used OCT data from 173 patients with a total of 15 558 B-scans, treated at Moorfields Eye Hospital. The data set included 2 common OCT devices and 2 macular conditions: wet age-related macular degeneration (107 scans) and diabetic macular edema (66 scans), covering the full range of severity, and from 3 points during treatment. Two expert graders performed pixel-level segmentations of intraretinal fluid, subretinal fluid, subretinal hyperreflective material, and pigment epithelial detachment, including all B-scans in each OCT volume, taking as long as 50 hours per scan. Quantitative evaluation of whole-volume model segmentations was performed. Qualitative evaluation of clinical applicability by 3 retinal experts was also conducted. Data were collected from June 1, 2012, to January 31, 2017, for set 1 and from January 1 to December 31, 2017, for set 2; graded between November 2018 and January 2020; and analyzed from February 2020 to November 2020.
Rating and stack ranking for clinical applicability by retinal specialists, model-grader agreement for voxelwise segmentations, and total volume evaluated using Dice similarity coefficients, Bland-Altman plots, and intraclass correlation coefficients.
Among the 173 patients included in the analysis (92 [53%] women), qualitative assessment found that automated whole-volume segmentation ranked better than or comparable to at least 1 expert grader in 127 scans (73%; 95% CI, 66%-79%). A neutral or positive rating was given to 135 model segmentations (78%; 95% CI, 71%-84%) and 309 expert gradings (2 per scan) (89%; 95% CI, 86%-92%). The model was rated neutrally or positively in 86% to 92% of diabetic macular edema scans and 53% to 87% of age-related macular degeneration scans. Intraclass correlations ranged from 0.33 (95% CI, 0.08-0.96) to 0.96 (95% CI, 0.90-0.99). Dice similarity coefficients ranged from 0.43 (95% CI, 0.29-0.66) to 0.78 (95% CI, 0.57-0.85).
This deep learning-based segmentation tool provided clinically useful measures of retinal disease that would otherwise be infeasible to obtain. Qualitative evaluation was additionally important to reveal clinical applicability for both care management and research.
由于手动分级所需的时间,光学相干断层扫描(OCT)扫描中视网膜疾病的定量容积测量是不可行的。最近已经开发出用于自动 OCT 分割的专家级深度学习系统。然而,这些系统的潜在临床适用性在很大程度上是未知的。
评估一种用于 4 种临床重要病理特征的全容积分割的深度学习模型,并评估其临床适用性。
设计、设置、参与者:这项诊断研究使用了来自 173 名患者的总共 15558 个 B 扫描的 OCT 数据,这些患者在 Moorfields 眼科医院接受治疗。该数据集包括 2 种常见的 OCT 设备和 2 种黄斑病变:湿性年龄相关性黄斑变性(107 个扫描)和糖尿病性黄斑水肿(66 个扫描),涵盖了所有严重程度,并在治疗过程中的 3 个点进行了扫描。两名专家分级员对视网膜内液、视网膜下液、视网膜下高反射物质和色素上皮脱离进行了像素级分割,包括每个 OCT 容积中的所有 B 扫描,每个扫描的分割时间长达 50 小时。对全容积模型分割进行了定量评估。还由 3 名视网膜专家进行了临床适用性的定性评估。数据于 2012 年 6 月 1 日至 2017 年 1 月 31 日在数据集 1 中收集,于 2017 年 1 月 1 日至 2017 年 12 月 31 日在数据集 2 中收集;于 2018 年 11 月至 2020 年 1 月进行分级;并于 2020 年 2 月至 2020 年 11 月进行分析。
视网膜专家的临床适用性评分和堆栈排名、体素分割的模型分级员一致性以及使用 Dice 相似系数、Bland-Altman 图和组内相关系数评估的总容积。
在分析中纳入的 173 名患者中(92[53%]名女性),自动全容积分割在 127 个扫描(73%;95%CI,66%-79%)中被评估为优于或至少与 1 名专家分级员相当。135 个模型分割(78%;95%CI,71%-84%)和 309 个专家分级(每个扫描 2 个)(89%;95%CI,86%-92%)得到了中性或积极的评价。在 86%至 92%的糖尿病性黄斑水肿扫描和 53%至 87%的年龄相关性黄斑变性扫描中,模型得到了中性或积极的评价。组内相关系数范围为 0.33(95%CI,0.08-0.96)至 0.96(95%CI,0.90-0.99)。Dice 相似系数范围为 0.43(95%CI,0.29-0.66)至 0.78(95%CI,0.57-0.85)。
这种基于深度学习的分割工具提供了临床上有用的视网膜疾病测量方法,否则这些方法是不可行的。定性评估对于揭示护理管理和研究的临床适用性也很重要。