ConReg-R：对经验 p 值分布进行外推性再校准，以改进错误发现率估计。

ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimates.

机构信息

Computational & Mathematical Biology, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore.

出版信息

Biol Direct. 2011 May 20;6:27. doi: 10.1186/1745-6150-6-27.

DOI:10.1186/1745-6150-6-27

PMID:21595983

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3130718/

Abstract

BACKGROUND

False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity.

RESULTS

We propose a new extrapolative method called Constrained Regression Recalibration (ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (π0) and FDR are estimated after the recalibration.

CONCLUSIONS

ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.

摘要

背景

错误发现率（FDR）控制通常被认为是多重假设检验问题中最合适的误差控制方法。FDR 的估计准确性取决于每个检验的 p 值的估计准确性以及分布的基本假设的有效性。然而，在许多实际的检验问题中，例如在基因组学中，由于许多已知或未知的原因，p 值可能被低估或高估。因此，FDR 的估计会受到影响，失去其真实性。

结果

我们提出了一种新的外推方法，称为约束回归再校准（ConReg-R），通过对其分布进行建模来重新校准经验 p 值，从而改善 FDR 估计。我们的 ConReg-R 方法基于以下观察结果：来自真实零假设的准确估计的 p 值遵循均匀分布，并且观察到的 p 值分布实际上是来自真实零假设和真实替代假设的 p 值分布的混合。因此，ConReg-R 重新校准观察到的 p 值，以使它们表现出理想的经验 p 值分布的特性。重新校准后，估计了真实零假设的比例（π0）和 FDR。

结论

ConReg-R 提供了一种有效改善 FDR 估计的方法。它只需要检验的 p 值，并且避免了原始检验数据的置换。我们证明，该方法在从微阵列和 RNA-seq 实验获得的几个基因表达数据集中显著提高了 FDR 估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b7/3130718/5f6ef711d388/1745-6150-6-27-1.jpg

相似文献

ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimates.ConReg-R：对经验 p 值分布进行外推性再校准，以改进错误发现率估计。

Biol Direct. 2011 May 20;6:27. doi: 10.1186/1745-6150-6-27.

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.在强相关结构下改进错误发现率（FDR）控制中零假设数量估计的重采样策略。

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

Empirical Bayes screening of many p-values with applications to microarray studies.用于微阵列研究的多p值经验贝叶斯筛选。

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.关于使用基于排列的错误发现率估计来比较微阵列数据不同分析方法的说明。

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups.假发现率估计的变异来源包括样本量、相关性和组间固有差异。

BMC Bioinformatics. 2012;13 Suppl 13(Suppl 13):S1. doi: 10.1186/1471-2105-13-S13-S1. Epub 2012 Aug 24.

A new estimation of protein-level false discovery rate.一种新的蛋白质水平假发现率估计方法。

BMC Genomics. 2018 Aug 13;19(Suppl 6):567. doi: 10.1186/s12864-018-4923-3.

Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data.基于秩不变重采样的小样本微阵列数据分析中错误发现率估计

BMC Bioinformatics. 2005 Jul 22;6:187. doi: 10.1186/1471-2105-6-187.

Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments.基于排列的多因素微阵列实验多重检验中零统计量的构建。

Bioinformatics. 2006 Jun 15;22(12):1486-94. doi: 10.1093/bioinformatics/btl109. Epub 2006 Mar 30.

Sample size reassessment for a two-stage design controlling the false discovery rate.用于控制错误发现率的两阶段设计的样本量重新评估。

Stat Appl Genet Mol Biol. 2015 Nov;14(5):429-42. doi: 10.1515/sagmb-2014-0025.

Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。

BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.

引用本文的文献

Mouse models of NADK2 deficiency analyzed for metabolic and gene expression changes to elucidate pathophysiology.分析 NADK2 缺乏症的小鼠模型的代谢和基因表达变化，以阐明其病理生理学。

Hum Mol Genet. 2022 Nov 28;31(23):4055-4074. doi: 10.1093/hmg/ddac151.

The integrated stress response contributes to tRNA synthetase-associated peripheral neuropathy.整合应激反应导致 tRNA 合成酶相关的周围神经病。

Science. 2021 Sep 3;373(6559):1156-1161. doi: 10.1126/science.abb3414. Epub 2021 Sep 1.

High-resolution deconstruction of evolution induced by chemotherapy treatments in breast cancer xenografts.高分辨率解析化疗处理诱导的乳腺癌异种移植中的进化。

Sci Rep. 2018 Dec 18;8(1):17937. doi: 10.1038/s41598-018-36184-8.

Development and validation of the JAX Cancer Treatment Profile™ for detection of clinically actionable mutations in solid tumors.用于检测实体瘤中具有临床可操作性突变的JAX癌症治疗概况™的开发与验证

Exp Mol Pathol. 2015 Feb;98(1):106-12. doi: 10.1016/j.yexmp.2014.12.009. Epub 2015 Jan 3.

A mutation in a splicing factor that causes retinitis pigmentosa has a transcriptome-wide effect on mRNA splicing.一种导致视网膜色素变性的剪接因子突变对mRNA剪接具有全转录组范围的影响。

BMC Res Notes. 2014 Jun 27;7:401. doi: 10.1186/1756-0500-7-401.

Composite selection signals can localize the trait specific genomic regions in multi-breed populations of cattle and sheep.复合选择信号可以定位牛和羊多品种群体中的性状特异基因组区域。

BMC Genet. 2014 Mar 17;15:34. doi: 10.1186/1471-2156-15-34.

本文引用的文献

Knowledge-based data analysis comes of age.基于知识的数据分析已经成熟。

Brief Bioinform. 2010 Jan;11(1):30-9. doi: 10.1093/bib/bbp044. Epub 2009 Oct 23.

RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.RNA测序：技术可重复性评估及与基因表达阵列的比较

Genome Res. 2008 Sep;18(9):1509-17. doi: 10.1101/gr.079558.108. Epub 2008 Jun 11.

Identifying differentially expressed genes in time-course microarray experiment without replicate.在无重复的时间进程微阵列实验中鉴定差异表达基因。

J Bioinform Comput Biol. 2007 Apr;5(2a):281-96. doi: 10.1142/s0219720007002655.

Correlation between gene expression levels and limitations of the empirical bayes methodology for finding differentially expressed genes.基因表达水平与用于寻找差异表达基因的经验贝叶斯方法局限性之间的相关性。

Stat Appl Genet Mol Biol. 2005;4:Article34. doi: 10.2202/1544-6115.1157. Epub 2005 Nov 22.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

Adjusting batch effects in microarray expression data using empirical Bayes methods.使用经验贝叶斯方法调整微阵列表达数据中的批次效应。

Biostatistics. 2007 Jan;8(1):118-27. doi: 10.1093/biostatistics/kxj037. Epub 2006 Apr 21.

Bias in the estimation of false discovery rate in microarray studies.微阵列研究中错误发现率估计的偏差。

Bioinformatics. 2005 Oct 15;21(20):3865-72. doi: 10.1093/bioinformatics/bti626. Epub 2005 Aug 16.

Estimation of false discovery rates in multiple testing: application to gene microarray data.多重检验中错误发现率的估计：应用于基因微阵列数据。

Biometrics. 2003 Dec;59(4):1071-81. doi: 10.1111/j.0006-341x.2003.00123.x.

Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values.通过近似和划分p值的经验分布来估计微阵列研究中假阳性和假阴性的发生率。

Bioinformatics. 2003 Jul 1;19(10):1236-42. doi: 10.1093/bioinformatics/btg148.

Global transcriptional responses of fission yeast to environmental stress.裂殖酵母对环境胁迫的全局转录反应。

Mol Biol Cell. 2003 Jan;14(1):214-29. doi: 10.1091/mbc.e02-08-0499.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ConReg-R：对经验 p 值分布进行外推性再校准，以改进错误发现率估计。

ConReg-R: Extrapolative recalibration of the empirical distribution of p-values to improve false discovery rate estimates.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献