Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 3362 Fifth Avenue, Pittsburgh, Pennsylvania 15213, USA.
Med Phys. 2010 Nov;37(11):5821-30. doi: 10.1118/1.3503849.
When comparing binary test results from two diagnostic systems, superiority in both "sensitivity" and "specificity" also implies differences in all conventional summary indices and locally in the underlying receiver operating characteristics (ROC) curves. However, when one of the two binary tests has higher sensitivity and lower specificity (or vice versa), comparisons of their performance levels are nontrivial and the use of different summary indices may lead to contradictory conclusions. A frequently used approach that is free of subjectivity associated with summary indices is based on the comparison of the underlying ROC curves that requires the collection of rating data using multicategory scales, whether natural or experimentally imposed. However, data for reliable estimation of ROC curves are frequently unavailable. The purpose of this article is to develop an approach of using "diagnostic likelihood ratios", namely, likelihood ratios of "positive" or "negative" responses, to make simple inferences regarding the underlying ROC curves and associated areas in the absence of reliable rating data or regarding the relative binary characteristics, when these are of primary interest.
For inferences related to underlying curves, the authors exploit the assumption of concavity of the true underlying ROC curve to describe conditions under which these curves have to be different and under which the curves have different areas. For scenarios when the binary characteristics are of primary interest, the authors use characteristics of "chance performance" to demonstrate that the derived conditions provide strong evidence of superiority of one binary test as compared to another. By relating these derived conditions to hypotheses about the true likelihood ratios of two binary diagnostic tests being compared, the authors enable a straightforward statistical procedure for the corresponding inferences.
The authors derived simple algebraic and graphical methods for describing the conditions for superiority of one of two diagnostic tests with respect to their binary characteristics, the underlying ROC curves, or the areas under the curves. The graphical regions are useful for identifying potential differences between two systems, which then have to be tested statistically. The simple statistical tests can be performed with well known methods for comparison of diagnostic likelihood ratios. The developed approach offers a solution for some of the more difficult to analyze scenarios, where diagnostic tests do not demonstrate concordant differences in terms of both sensitivity and specificity. In addition, the resulting inferences do not contradict the conclusions that can be obtained using conventional and reasonably defined summary indices.
When binary diagnostic tests are of primary interest, the proposed approach offers an objective and powerful method for comparing two binary diagnostic tests. The significant advantage of this method is that it enables objective analyses when one test has higher sensitivity but lower specificity, while ensuring agreement with study conclusions based on other reasonable and widely acceptable summary indices. For truly multicategory diagnostic tests, the proposed method can help in concluding inferiority of one of the diagnostic tests based on binary data, thereby potentially saving the need for conducting a more expensive multicategory ROC study.
当比较两种诊断系统的二项测试结果时,“敏感性”和“特异性”的优势也意味着所有常规总结指标以及潜在的接收者操作特征(ROC)曲线的局部差异。然而,当两种二进制测试中的一种具有更高的敏感性和更低的特异性(或反之亦然)时,它们的性能水平的比较就变得非平凡了,并且使用不同的总结指标可能会导致矛盾的结论。一种常用的方法是基于潜在 ROC 曲线的比较,该方法无需使用与总结指标相关的主观性,需要使用多类别量表(无论是自然还是实验强加的)收集评分数据。然而,用于可靠估计 ROC 曲线的数据通常不可用。本文的目的是开发一种使用“诊断似然比”(即“阳性”或“阴性”反应的似然比)的方法,在没有可靠的评分数据或关于相对二进制特征的情况下,在没有可靠的评分数据或关于相对二进制特征的情况下,对潜在的 ROC 曲线和相关区域进行简单推断,当这些特征是主要关注点时。
对于与基础曲线相关的推断,作者利用真实基础 ROC 曲线的凹性假设来描述这些曲线必须不同的条件,以及这些曲线具有不同区域的条件。对于二进制特征是主要关注点的情况,作者使用“机会表现”的特征来证明得出的条件提供了一种强有力的证据,表明与另一种二进制诊断测试相比,一种二进制测试具有优越性。通过将这些推导条件与正在比较的两种二进制诊断测试的真实似然比的假设联系起来,作者为相应的推断提供了一种简单的统计程序。
作者推导出了用于描述与两个诊断测试中的一个的二进制特征、潜在 ROC 曲线或曲线下面积相比具有优越性的简单代数和图形方法。图形区域可用于识别两个系统之间可能存在的差异,然后需要对其进行统计测试。简单的统计测试可以使用比较诊断似然比的已知方法进行。所开发的方法为一些更难分析的场景提供了一种解决方案,在这些场景中,诊断测试在敏感性和特异性方面都没有表现出一致的差异。此外,所得推断并不与使用传统和合理定义的总结指标可以获得的结论相矛盾。
当二进制诊断测试是主要关注点时,所提出的方法为比较两种二进制诊断测试提供了一种客观而强大的方法。这种方法的一个显著优势是,当一种测试具有更高的敏感性但更低的特异性时,它能够进行客观分析,同时确保与基于其他合理和广泛接受的总结指标的研究结论一致。对于真正的多类别诊断测试,所提出的方法可以帮助根据二项数据得出一个诊断测试较差的结论,从而有可能避免进行更昂贵的多类别 ROC 研究。