Division of Biostatistics, University of Southern California, Keck School of Medicine, Arcadia, CA 91006, U.S.A..
Stat Med. 2011 Jul 10;30(15):1852-64. doi: 10.1002/sim.4232. Epub 2011 Apr 15.
Sensitivity, specificity, positive and negative predictive value are typically used to quantify the accuracy of a binary screening test. In some studies, it may not be ethical or feasible to obtain definitive disease ascertainment for all subjects using a gold standard test. When a gold standard test cannot be used, an imperfect reference test that is less than 100 per cent sensitive and specific may be used instead. In breast cancer screening, for example, follow-up for cancer diagnosis is used as an imperfect reference test for women where it is not possible to obtain gold standard results. This incomplete ascertainment of true disease, or differential disease verification, can result in biased estimates of accuracy. In this paper, we derive the apparent accuracy values for studies subject to differential verification. We determine how the bias is affected by the accuracy of the imperfect reference test, the percent who receive the imperfect reference standard test not receiving the gold standard, the prevalence of the disease, and the correlation between the results for the screening test and the imperfect reference test. It is shown that designs with differential disease verification can yield biased estimates of accuracy. Estimates of sensitivity in cancer screening trials may be substantially biased. However, careful design decisions, including selection of the imperfect reference test, can help to minimize bias. A hypothetical breast cancer screening study is used to illustrate the problem.
灵敏度、特异性、阳性预测值和阴性预测值通常用于量化二分类筛查试验的准确性。在某些研究中,使用金标准对所有受试者进行明确的疾病确定可能在伦理上不可行或不切实际。当无法使用金标准时,可以使用灵敏度和特异性均不足 100%的不完美参考测试来替代。例如,在乳腺癌筛查中,对无法获得金标准结果的女性,使用癌症诊断随访作为不完美的参考测试。这种对真正疾病的不完全确定或差异疾病验证会导致准确性的估计产生偏差。在本文中,我们推导出了受差异验证影响的研究的表观准确性值。我们确定了偏倚如何受到不完美参考测试的准确性、未接受金标准测试的接受不完美参考标准测试的百分比、疾病的流行率以及筛查测试和不完美参考测试结果之间的相关性的影响。结果表明,具有差异疾病验证的设计可能会产生有偏差的准确性估计。癌症筛查试验中灵敏度的估计可能会产生很大的偏差。然而,精心的设计决策,包括不完美参考测试的选择,可以帮助最小化偏差。使用一个假设的乳腺癌筛查研究来说明这个问题。