Du Katherine, Shah Stavan, Bollepalli Sandeep Chandra, Ibrahim Mohammed Nasar, Gadari Adarsh, Sutharahan Shan, Sahel José-Alain, Chhablani Jay, Vupparaboina Kiran Kumar
Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, PA, United States of America.
Department of Computer Science, University of North Carolina at Greensboro, Greensboro, NC, United States of America.
PLoS One. 2024 Dec 18;19(12):e0314707. doi: 10.1371/journal.pone.0314707. eCollection 2024.
Various imaging features on optical coherence tomography (OCT) are crucial for identifying and defining disease progression. Establishing a consensus on these imaging features is essential, particularly for training deep learning models for disease classification. This study aims to analyze the inter-rater reliability in labeling the quality and common imaging signatures of retinal OCT scans.
500 OCT scans obtained from CIRRUS HD-OCT 5000 devices were displayed at 512x1024x128 resolution on a customizable, in-house annotation software. Each patient's eye was represented by 16 random scans. Two masked reviewers independently labeled the quality and specific pathological features of each scan. Evaluated features included overall image quality, presence of fovea, and disease signatures including subretinal fluid (SRF), intraretinal fluid (IRF), drusen, pigment epithelial detachment (PED), and hyperreflective material. The raw percentage agreement and Cohen's kappa (κ) coefficient were used to evaluate concurrence between the two sets of labels.
Our analysis revealed κ = 0.60 for the inter-rater reliability of overall scan quality, indicating substantial agreement. In contrast, there was slight agreement in determining the cause of poor image quality (κ = 0.18). The binary determination of presence and absence of retinal disease signatures showed almost complete agreement between reviewers (κ = 0.85). Specific retinal pathologies, such as the foveal location of the scan (0.78), IRF (0.63), drusen (0.73), and PED (0.87), exhibited substantial concordance. However, less agreement was found in identifying SRF (0.52), hyperreflective dots (0.41), and hyperreflective foci (0.33).
Our study demonstrates significant inter-rater reliability in labeling the quality and retinal pathologies on OCT scans. While some features show stronger agreement than others, these standardized labels can be utilized to create automated machine learning tools for diagnosing retinal diseases and capturing valuable pathological features in each scan. This standardization will aid in the consistency of medical diagnoses and enhance the accessibility of OCT diagnostic tools.
光学相干断层扫描(OCT)上的各种成像特征对于识别和定义疾病进展至关重要。就这些成像特征达成共识至关重要,特别是对于训练用于疾病分类的深度学习模型。本研究旨在分析在标记视网膜OCT扫描的质量和常见成像特征方面的评分者间信度。
从CIRRUS HD - OCT 5000设备获得的500份OCT扫描以512x1024x128分辨率显示在可定制的内部注释软件上。每位患者的眼睛由16次随机扫描代表。两名盲态审阅者独立标记每次扫描的质量和特定病理特征。评估的特征包括整体图像质量、黄斑的存在以及疾病特征,包括视网膜下液(SRF)、视网膜内液(IRF)、玻璃膜疣、色素上皮脱离(PED)和高反射物质。原始百分比一致性和科恩kappa(κ)系数用于评估两组标记之间的一致性。
我们的分析显示,整体扫描质量的评分者间信度κ = 0.60,表明有实质性一致性。相比之下,在确定图像质量差的原因方面一致性较弱(κ = 0.18)。视网膜疾病特征存在与否的二元判定显示审阅者之间几乎完全一致(κ = 0.85)。特定的视网膜病变,如扫描的黄斑位置(0.78)、IRF(0.63)、玻璃膜疣(0.73)和PED(0.87),表现出实质性的一致性。然而,在识别SRF(0.52)、高反射点(0.41)和高反射灶(0.33)方面一致性较低。
我们的研究表明在标记OCT扫描的质量和视网膜病变方面存在显著的评分者间信度。虽然一些特征的一致性比其他特征更强,但这些标准化标签可用于创建用于诊断视网膜疾病和捕捉每次扫描中有价值病理特征的自动化机器学习工具。这种标准化将有助于提高医学诊断的一致性,并增强OCT诊断工具的可及性。