Suppr
超能文献

最优两阶段抽样设计用于比较两种二分类规则的准确性。

Optimal two-phase sampling design for comparing accuracies of two binary classification rules.

机构信息

Department of Biostatistics, Indiana University School of Public Health and School of Medicine, Indianapolis, IN, U.S.A.

出版信息

Stat Med. 2014 Feb 10;33(3):500-13. doi: 10.1002/sim.5946. Epub 2013 Sep 4.

DOI:10.1002/sim.5946

PMID:24038175

Abstract

In this paper, we consider the design for comparing the performance of two binary classification rules, for example, two record linkage algorithms or two screening tests. Statistical methods are well developed for comparing these accuracy measures when the gold standard is available for every unit in the sample, or in a two-phase study when the gold standard is ascertained only in the second phase in a subsample using a fixed sampling scheme. However, these methods do not attempt to optimize the sampling scheme to minimize the variance of the estimators of interest. In comparing the performance of two classification rules, the parameters of primary interest are the difference in sensitivities, specificities, and positive predictive values. We derived the analytic variance formulas for these parameter estimates and used them to obtain the optimal sampling design. The efficiency of the optimal sampling design is evaluated through an empirical investigation that compares the optimal sampling with simple random sampling and with proportional allocation. Results of the empirical study show that the optimal sampling design is similar for estimating the difference in sensitivities and in specificities, and both achieve a substantial amount of variance reduction with an over-sample of subjects with discordant results and under-sample of subjects with concordant results. A heuristic rule is recommended when there is no prior knowledge of individual sensitivities and specificities, or the prevalence of the true positive findings in the study population. The optimal sampling is applied to a real-world example in record linkage to evaluate the difference in classification accuracy of two matching algorithms.

摘要

在本文中，我们考虑了设计用于比较两种二进制分类规则的性能的问题，例如两种记录链接算法或两种筛选测试。当每个样本中的每个单位都有金标准时，或者当使用固定抽样方案仅在子样本的第二阶段中确定金标准时，统计方法已经很好地开发出来用于比较这些准确性度量。然而，这些方法并没有尝试优化抽样方案以最小化感兴趣的估计量的方差。在比较两种分类规则的性能时，主要感兴趣的参数是敏感性、特异性和阳性预测值的差异。我们推导出了这些参数估计的解析方差公式，并使用它们来获得最优的抽样设计。通过对最优抽样与简单随机抽样和比例分配的比较，评估了最优抽样设计的效率。实证研究的结果表明，最优抽样设计对于估计敏感性和特异性的差异是相似的，并且都通过对不一致结果的受试者进行过度采样和对一致结果的受试者进行抽样来实现大量方差减少。当没有个体敏感性和特异性的先验知识，或者研究人群中阳性发现的流行率时，推荐了一种启发式规则。最优抽样被应用于记录链接的实际示例中，以评估两种匹配算法的分类准确性差异。