Center for Evidence-based Medicine, and Department of Health Service, Policy and Practice, Brown University, Providence, RI 02912, USA.
J Gen Intern Med. 2012 Jun;27 Suppl 1(Suppl 1):S67-75. doi: 10.1007/s11606-012-2031-7.
The classical paradigm for evaluating test performance compares the results of an index test with a reference test. When the reference test does not mirror the "truth" adequately well (e.g. is an "imperfect" reference standard), the typical ("naïve") estimates of sensitivity and specificity are biased. One has at least four options when performing a systematic review of test performance when the reference standard is "imperfect": (a) to forgo the classical paradigm and assess the index test's ability to predict patient relevant outcomes instead of test accuracy (i.e., treat the index test as a predictive instrument); (b) to assess whether the results of the two tests (index and reference) agree or disagree (i.e., treat them as two alternative measurement methods); (c) to calculate "naïve" estimates of the index test's sensitivity and specificity from each study included in the review and discuss in which direction they are biased; (d) mathematically adjust the "naïve" estimates of sensitivity and specificity of the index test to account for the imperfect reference standard. We discuss these options and illustrate some of them through examples.
评估检测性能的经典范例是将指标检测的结果与参考检测进行比较。当参考检测不能充分反映“真实情况”(例如,参考标准不完美)时,典型的(“天真的”)敏感度和特异性估计值会有偏差。当参考标准不完美时,对检测性能进行系统评价时,至少有四种选择:(a)放弃经典范例,评估指标检测预测患者相关结局的能力,而不是检测准确性(即,将指标检测视为预测工具);(b)评估两种检测(指标和参考)的结果是否一致或不一致(即,将它们视为两种替代测量方法);(c)从综述中包含的每项研究中计算指标检测的“天真”敏感度和特异性估计值,并讨论它们偏向哪个方向;(d)通过数学调整指标检测的“天真”敏感度和特异性估计值,以考虑不完美的参考标准。我们将讨论这些选项,并通过示例说明其中的一些。