自抽样样本上假设检验和模型选择的陷阱：生物统计学应用中的原因与后果

Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications.

作者信息

Janitza Silke, Binder Harald, Boulesteix Anne-Laure

机构信息

Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377 Munich, Germany.

Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany.

出版信息

Biom J. 2016 May;58(3):447-73. doi: 10.1002/bimj.201400246. Epub 2015 Sep 15.

DOI:10.1002/bimj.201400246

PMID:26372408

Abstract

The bootstrap method has become a widely used tool applied in diverse areas where results based on asymptotic theory are scarce. It can be applied, for example, for assessing the variance of a statistic, a quantile of interest or for significance testing by resampling from the null hypothesis. Recently, some approaches have been proposed in the biometrical field where hypothesis testing or model selection is performed on a bootstrap sample as if it were the original sample. P-values computed from bootstrap samples have been used, for example, in the statistics and bioinformatics literature for ranking genes with respect to their differential expression, for estimating the variability of p-values and for model stability investigations. Procedures which make use of bootstrapped information criteria are often applied in model stability investigations and model averaging approaches as well as when estimating the error of model selection procedures which involve tuning parameters. From the literature, however, there is evidence that p-values and model selection criteria evaluated on bootstrap data sets do not represent what would be obtained on the original data or new data drawn from the overall population. We explain the reasons for this and, through the use of a real data set and simulations, we assess the practical impact on procedures relevant to biometrical applications in cases where it has not yet been studied. Moreover, we investigate the behavior of subsampling (i.e., drawing from a data set without replacement) as a potential alternative solution to the bootstrap for these procedures.

摘要

自助法已成为一种广泛应用的工具，适用于基于渐近理论的结果稀缺的各种领域。例如，它可用于评估统计量的方差、感兴趣的分位数，或通过从原假设中重新抽样进行显著性检验。最近，在生物统计学领域提出了一些方法，即在自助样本上进行假设检验或模型选择，就好像它是原始样本一样。例如，从自助样本计算得到的P值已在统计学和生物信息学文献中用于对基因的差异表达进行排名、估计P值的变异性以及进行模型稳定性研究。利用自助信息准则的程序通常应用于模型稳定性研究和模型平均方法，以及在估计涉及调整参数的模型选择程序的误差时。然而，从文献中可以看出，在自助数据集上评估的P值和模型选择标准并不代表在原始数据或从总体中抽取的新数据上会得到的结果。我们解释了其中的原因，并通过使用一个真实数据集和模拟，评估了在尚未研究的情况下对生物统计学应用相关程序的实际影响。此外，我们研究了子抽样（即无放回地从数据集中抽样）作为这些程序的自助法潜在替代解决方案的行为。