Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada.
Stat Med. 2012 May 20;31(11-12):1129-38. doi: 10.1002/sim.4444. Epub 2012 Feb 21.
When no gold standard is available to evaluate a diagnostic or screening test, as is often the case, an imperfect reference standard test must be used instead. Furthermore, the errors of the test and its reference standard may not be independent. Some authors have opined that positively dependent errors will lead to overestimation of test performance. Although positive dependence does increase agreement between the test and the reference standard, it is not clear if test accuracy will necessarily be overestimated in this situation, and the case of negatively associated test errors is even less clear. To examine this issue in more detail, we derive the apparent sensitivity, specificity, and overall accuracy of a test relative to an imperfect reference standard and the bias in these parameters. We demonstrate that either positive or negative bias can occur if the reference standard is imperfect. The type and magnitude of bias depend on several components: the disease prevalence, the true test sensitivity and specificity, the covariance between the false-negative test errors among the true disease cases, and the covariance between the false-positive test errors among the true noncases. If, for example, sensitivity and specificity are 0.8 for both the test and reference standard and the errors have a moderate positive dependence, test sensitivity is then underestimated at low prevalence but overestimated at high prevalence, while the opposite occurs for specificity. We illustrate these ideas through general numerical calculations and an empirical example of screening for breast cancer with magnetic resonance imaging and mammography.
当没有黄金标准可用于评估诊断或筛查测试时,通常情况下必须使用不完美的参考标准测试来替代。此外,测试及其参考标准的误差可能不是独立的。一些作者认为,阳性相关误差会导致测试性能的高估。虽然阳性相关性确实会增加测试与参考标准之间的一致性,但在这种情况下,测试准确性是否必然会被高估尚不清楚,而与测试误差负相关的情况则更不清楚。为了更详细地研究这个问题,我们推导出相对于不完美的参考标准,测试的表观灵敏度、特异性和总体准确性以及这些参数的偏差。我们证明,如果参考标准不完美,就会出现正偏差或负偏差。偏差的类型和大小取决于几个因素:疾病的流行率、真实的测试灵敏度和特异性、真实疾病病例中假阴性测试误差之间的协方差,以及真实非病例中假阳性测试误差之间的协方差。例如,如果测试和参考标准的灵敏度和特异性均为 0.8,并且误差具有中度正相关性,那么在低流行率下,测试灵敏度会被低估,但在高流行率下会被高估,而特异性则相反。我们通过一般数值计算和磁共振成像与乳房 X 光检查筛查乳腺癌的实证例子来说明这些想法。