Manatunga Amita K, Binongo José Nilo G, Taylor Andrew T
Department of Biostatistics and Bioinformatics, Emory University School of Public Health, 1364 Clifton Road NE, Atlanta, GA 30322, USA.
EJNMMI Res. 2011 Jun 20;1(5):1-8. doi: 10.1186/2191-219X-1-5.
The accuracy of computer-aided diagnosis (CAD) software is best evaluated by comparison to a gold standard which represents the true status of disease. In many settings, however, knowledge of the true status of disease is not possible and accuracy is evaluated against the interpretations of an expert panel. Common statistical approaches to evaluate accuracy include receiver operating characteristic (ROC) and kappa analysis but both of these methods have significant limitations and cannot answer the question of equivalence: Is the CAD performance equivalent to that of an expert? The goal of this study is to show the strength of log-linear analysis over standard ROC and kappa statistics in evaluating the accuracy of computer-aided diagnosis of renal obstruction compared to the diagnosis provided by expert readers. METHODS: Log-linear modeling was utilized to analyze a previously published database that used ROC and kappa statistics to compare diuresis renography scan interpretations (non-obstructed, equivocal, or obstructed) generated by a renal expert system (RENEX) in 185 kidneys (95 patients) with the independent and consensus scan interpretations of three experts who were blinded to clinical information and prospectively and independently graded each kidney as obstructed, equivocal, or non-obstructed. RESULTS: Log-linear modeling showed that RENEX and the expert consensus had beyond-chance agreement in both non-obstructed and obstructed readings (both p < 0.0001). Moreover, pairwise agreement between experts and pairwise agreement between each expert and RENEX were not significantly different (p = 0.41, 0.95, 0.81 for the non-obstructed, equivocal, and obstructed categories, respectively). Similarly, the three-way agreement of the three experts and three-way agreement of two experts and RENEX was not significantly different for non-obstructed (p = 0.79) and obstructed (p = 0.49) categories. CONCLUSION: Log-linear modeling showed that RENEX was equivalent to any expert in rating kidneys, particularly in the obstructed and non-obstructed categories. This conclusion, which could not be derived from the original ROC and kappa analysis, emphasizes and illustrates the role and importance of log-linear modeling in the absence of a gold standard. The log-linear analysis also provides additional evidence that RENEX has the potential to assist in the interpretation of diuresis renography studies.
计算机辅助诊断(CAD)软件的准确性最好通过与代表疾病真实状况的金标准进行比较来评估。然而,在许多情况下,无法得知疾病的真实状况,准确性是根据专家小组的解读来评估的。评估准确性的常见统计方法包括接受者操作特征(ROC)分析和kappa分析,但这两种方法都有显著局限性,无法回答等效性问题:CAD的性能是否与专家的性能等效?本研究的目的是展示在评估肾梗阻计算机辅助诊断的准确性方面,对数线性分析相对于标准ROC和kappa统计的优势,与专家读者提供的诊断进行比较。
利用对数线性模型分析一个先前发表的数据库,该数据库使用ROC和kappa统计来比较由肾脏专家系统(RENEX)在185个肾脏(95名患者)中生成的利尿肾图扫描解读(无梗阻、可疑或梗阻)与三位对临床信息不知情的专家的独立和一致扫描解读,并前瞻性地且独立地将每个肾脏分级为梗阻、可疑或无梗阻。
对数线性模型显示RENEX与专家共识在无梗阻和梗阻解读方面均有超出偶然的一致性(p均<0.0001)。此外,专家之间的两两一致性以及每位专家与RENEX之间的两两一致性在无梗阻、可疑和梗阻类别中无显著差异(分别为p = 0.41、0.95、0.81)。同样地,三位专家的三方一致性以及两位专家和RENEX的三方一致性在无梗阻(p = 0.79)和梗阻(p = 0.49)类别中无显著差异。
对数线性模型显示RENEX在对肾脏评级方面与任何专家等效,尤其是在梗阻和无梗阻类别中。这一结论无法从原始ROC和kappa分析中得出,强调并说明了在没有金标准的情况下对数线性模型的作用和重要性。对数线性分析还提供了额外证据表明RENEX有可能协助解读利尿肾图研究。