在估计真零假设比例时减少偏差和方差

Bias and variance reduction in estimating the proportion of true-null hypotheses.

作者信息

Cheng Yebin, Gao Dexiang, Tong Tiejun

机构信息

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, PR China.

Department of Biostatistics and Informatics, University of Colorado, Denver, CO, USA.

出版信息

Biostatistics. 2015 Jan;16(1):189-204. doi: 10.1093/biostatistics/kxu029. Epub 2014 Jun 23.

DOI:10.1093/biostatistics/kxu029

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4263223/

Abstract

When testing a large number of hypotheses, estimating the proportion of true nulls, denoted by π(0), becomes increasingly important. This quantity has many applications in practice. For instance, a reliable estimate of π(0) can eliminate the conservative bias of the Benjamini-Hochberg procedure on controlling the false discovery rate. It is known that most methods in the literature for estimating π(0) are conservative. Recently, some attempts have been paid to reduce such estimation bias. Nevertheless, they are either over bias corrected or suffering from an unacceptably large estimation variance. In this paper, we propose a new method for estimating π(0) that aims to reduce the bias and variance of the estimation simultaneously. To achieve this, we first utilize the probability density functions of false-null p-values and then propose a novel algorithm to estimate the quantity of π(0). The statistical behavior of the proposed estimator is also investigated. Finally, we carry out extensive simulation studies and several real data analysis to evaluate the performance of the proposed estimator. Both simulated and real data demonstrate that the proposed method may improve the existing literature significantly.

摘要

在检验大量假设时，估计真零假设的比例（用π(0)表示）变得越来越重要。这个量在实际中有许多应用。例如，对π(0)的可靠估计可以消除Benjamini-Hochberg程序在控制错误发现率方面的保守偏差。众所周知，文献中大多数估计π(0)的方法都是保守的。最近，人们已经做出了一些尝试来减少这种估计偏差。然而，它们要么过度校正偏差，要么存在不可接受的大估计方差。在本文中，我们提出了一种估计π(0)的新方法，旨在同时减少估计的偏差和方差。为了实现这一点，我们首先利用假零p值的概率密度函数，然后提出一种新颖的算法来估计π(0)的值。我们还研究了所提出估计量的统计行为。最后，我们进行了广泛的模拟研究和几个实际数据分析，以评估所提出估计量的性能。模拟数据和实际数据都表明，所提出的方法可能会显著改进现有文献。

相似文献

1

Bias and variance reduction in estimating the proportion of true-null hypotheses.

Biostatistics. 2015 Jan;16(1):189-204. doi: 10.1093/biostatistics/kxu029. Epub 2014 Jun 23.

2

Comparison of methods for estimating the number of true null hypotheses in multiplicity testing.

J Biopharm Stat. 2003 Nov;13(4):675-89. doi: 10.1081/BIP-120024202.

3

Estimating the proportion of true null hypotheses and adaptive false discovery rate control in discrete paradigm.

Biom J. 2024 Mar;66(2):e2200204. doi: 10.1002/bimj.202200204.

4

Towards accurate estimation of the proportion of true null hypotheses in multiple testing.

PLoS One. 2011 Apr 22;6(4):e18874. doi: 10.1371/journal.pone.0018874.

5

Estimation of the proportion of true null hypotheses under sparse dependence: Adaptive FDR controlling in microarray data.

Stat Methods Med Res. 2022 May;31(5):917-927. doi: 10.1177/09622802221074164. Epub 2022 Feb 8.

6

Estimating the proportion of true null hypotheses for multiple comparisons.

Cancer Inform. 2008;6:25-32. Epub 2008 Feb 14.

7

Multiple testing with discrete data: Proportion of true null hypotheses and two adaptive FDR procedures.

Biom J. 2018 Jul;60(4):761-779. doi: 10.1002/bimj.201700157. Epub 2018 May 11.

8

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

9

A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data.

Biostatistics. 2007 Oct;8(4):744-55. doi: 10.1093/biostatistics/kxm002. Epub 2007 Jan 22.

10

A robust method for large-scale multiple hypotheses testing.

Biom J. 2010 Apr;52(2):222-32. doi: 10.1002/bimj.200900177.

引用本文的文献

1

Bias-corrected estimators for proportion of true null hypotheses: application of adaptive FDR-controlling in segmented failure data.

J Appl Stat. 2021 Jul 27;49(14):3591-3613. doi: 10.1080/02664763.2021.1957790. eCollection 2022.

2

A statistical method for the conservative adjustment of false discovery rate (q-value).

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):69. doi: 10.1186/s12859-017-1474-6.

本文引用的文献

1

Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed -values.

J Appl Stat. 2013 Jan 1;40(9):1949-1964. doi: 10.1080/02664763.2013.800035.

2

Improved estimation of the noncentrality parameter distribution from a large number of t-statistics, with applications to false discovery rate estimation in microarray data analysis.

Biometrics. 2012 Dec;68(4):1178-87. doi: 10.1111/j.1541-0420.2012.01764.x. Epub 2012 May 2.

3

SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures.

Bioinformatics. 2011 Jan 15;27(2):225-31. doi: 10.1093/bioinformatics/btq650. Epub 2010 Nov 18.

4

Estimating the proportion of true null hypotheses for multiple comparisons.

Cancer Inform. 2008;6:25-32. Epub 2008 Feb 14.

5

Exploring the information in p-values for the analysis and planning of multiple-test experiments.

Biometrics. 2007 Jun;63(2):483-95. doi: 10.1111/j.1541-0420.2006.00704.x.

6

A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data.

Biostatistics. 2007 Oct;8(4):744-55. doi: 10.1093/biostatistics/kxm002. Epub 2007 Jan 22.

7

Parametric and nonparametric FDR estimation revisited.

Biometrics. 2006 Sep;62(3):735-44. doi: 10.1111/j.1541-0420.2006.00531.x.

8

A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays.

Bioinformatics. 2006 Jul 1;22(13):1608-15. doi: 10.1093/bioinformatics/btl148. Epub 2006 Apr 21.

9

Bias in the estimation of false discovery rate in microarray studies.

Bioinformatics. 2005 Oct 15;21(20):3865-72. doi: 10.1093/bioinformatics/bti626. Epub 2005 Aug 16.

10

Improved statistical tests for differential gene expression by shrinking variance components estimates.

Biostatistics. 2005 Jan;6(1):59-75. doi: 10.1093/biostatistics/kxh018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。