Chiang Michael F, Gelman Rony, Jiang Lei, Martinez-Perez M Elena, Du Yunling E, Flynn John T
Department of Ophthalmology, Columbia University College of Physicians and Surgeons, New York, New York, USA.
Trans Am Ophthalmol Soc. 2007;105:73-84; discussion 84-5.
To measure agreement and accuracy of plus disease diagnosis among retinopathy of prematurity (ROP) experts; and to compare expert performance to that of a computer-based analysis system, Retinal Image multiScale Analysis.
Twenty-two recognized ROP experts independently interpreted a set of 34 wide-angle retinal photographs for presence of plus disease. Diagnostic agreement was analyzed. A reference standard was defined based on majority vote of experts. Images were analyzed using individual and linear combinations of computer-based system parameters for arterioles and venules: integrated curvature (IC), diameter, and tortuosity index (TI). Sensitivity, specificity, and receiver operating characteristic areas under the curve (AUC) for plus disease diagnosis were determined for each expert and for the computer-based system.
Mean kappa statistic for each expert compared to all others was between 0 and 0.20 (slight agreement) in 1 expert (4.5%), 0.21 and 0.40 (fair agreement) in 3 experts (13.6%), 0.41 and 0.60 (moderate agreement) in 12 experts (54.5%), and 0.61 and 0.80 (substantial agreement) in 6 experts (27.3%). For the 22 experts, sensitivity compared to the reference standard ranged from 0.308 to 1.000, specificity from 0.571 to 1.000, and AUC from 0.784 to 1.000. Among individual computer system parameters compared to the reference standard, venular IC had highest AUC (0.853). Among linear combinations of parameters, the combination of arteriolar IC, arteriolar TI, venular IC, venular diameter, and venular TI had highest AUC (0.967).
Agreement and accuracy of plus disease diagnosis among ROP experts are imperfect. A computer-based system has potential to perform with comparable or better accuracy than human experts, but further validation is required.
测量早产儿视网膜病变(ROP)专家对加性病变诊断的一致性和准确性;并将专家的表现与基于计算机的分析系统视网膜图像多尺度分析进行比较。
22位公认的ROP专家独立解读一组34张广角视网膜照片,以确定是否存在加性病变。分析诊断一致性。基于专家的多数投票定义参考标准。使用基于计算机系统的小动脉和小静脉参数的个体参数及线性组合对图像进行分析:积分曲率(IC)、直径和迂曲指数(TI)。确定每位专家和基于计算机系统对加性病变诊断的敏感性、特异性和曲线下面积(AUC)。
与其他所有专家相比,1位专家(4.5%)的平均kappa统计量在0至0.20之间(轻度一致),3位专家(13.6%)在0.21至0.40之间(中度一致),12位专家(54.5%)在0.41至0.60之间(中度一致),6位专家(27.3%)在0.61至0.80之间(高度一致)。对于这22位专家,与参考标准相比,敏感性范围为0.308至1.000,特异性范围为0.571至1.000,AUC范围为0.784至1.000。与参考标准相比,在个体计算机系统参数中,小静脉IC的AUC最高(0.853)。在参数的线性组合中,小动脉IC、小动脉TI、小静脉IC、小静脉直径和小静脉TI的组合AUC最高(0.967)。
ROP专家对加性病变诊断的一致性和准确性并不理想。基于计算机的系统有可能表现出与人类专家相当或更高的准确性,但需要进一步验证。