Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Center for Devices and Radiologic Health, Food and Drug Administration, Silver Spring, MD, USA.
Stat Med. 2012 Mar 15;31(6):501-15. doi: 10.1002/sim.4432. Epub 2011 Nov 17.
In evaluating discriminatory performance of a new modality in a screening setting, a logistical constraint is that the prevalence of the disease of interest is typically very low. This implies that under a standard study design large numbers of subjects have to be evaluated using the new modality. However, if a predicate modality exists in clinical practice, one can base inclusion into the study of the new modality on the clinical results from the predicate to 'enrich' the population of diseased subjects in the study. If this enrichment is not accounted for when estimating sensitivity, specificity, and area under the ROC curve, these 'naive' estimates may be substantially biased compared with expected performance in the intended use population. We derive expressions for the magnitude of this bias in terms of correlations of modality scores. When such estimates are 'corrected' for the sampling weights using inverse probability weighting, the variances of the estimates of the above quantities are affected. We derive here analytic expressions for these variances. For a fixed number of diseased subjects, differential sampling increases the variance of the (corrected) estimates, all other things being equal. However, differential sampling also increases the number with disease for fixed total study size, which decreases the variance of the sensitivity and area under the ROC curve estimates, all other things being equal. The balance of these two effects determines the gain in efficiency when using enrichment and corrected estimates. These principles are illustrated with a simulation study motivated by the Digital Mammographic Imaging Screening Trial study, a trial of digital versus screen film mammography.
在评估新方法在筛查环境中的判别性能时,存在一个逻辑限制,即所关注疾病的流行率通常非常低。这意味着在标准研究设计下,必须使用新方法评估大量的对象。然而,如果在临床实践中存在一种预测方法,可以根据预测方法的临床结果将新方法纳入研究,以“丰富”研究中患病对象的人群。如果在估计敏感性、特异性和 ROC 曲线下面积时没有考虑到这种富集,那么与预期在目标人群中的性能相比,这些“天真”的估计可能会有很大的偏差。我们根据模态分数的相关性推导出了这种偏差的幅度的表达式。当使用逆概率加权对这些估计值进行“校正”以适应抽样权重时,上述数量的估计值的方差会受到影响。我们在这里推导出了这些方差的解析表达式。对于固定数量的患病对象,差异抽样会增加(校正后)估计值的方差,在其他条件相同的情况下。然而,差异抽样也会增加固定总研究规模中患有疾病的人数,这会降低敏感性和 ROC 曲线下面积估计值的方差,在其他条件相同的情况下。这两种效果的平衡决定了使用富集和校正估计值时的效率增益。这些原则通过数字乳腺成像筛查试验研究(一项数字与屏幕胶片乳腺摄影的试验)的模拟研究得到了说明。