Suppr超能文献

一种用于准确估计差异表达基因识别中错误发现率的通用方法。

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes.

机构信息

College of Life Science, Hunan Normal University, Changsha, Hunan 410087, China and Department of Biostatistics and Epidemiology, Georgia Regents University, Augusta, GA 30912-4900, USA.

出版信息

Bioinformatics. 2014 Jul 15;30(14):2018-25. doi: 10.1093/bioinformatics/btu124. Epub 2014 Mar 14.

Abstract

UNLABELLED

The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

AVAILABILITY AND IMPLEMENTATION

Our program is implemented in Matlab and is available upon request.

摘要

未加标签

基因组数据、转录组数据、蛋白质组数据和单核苷酸多态性数据等“组学”数据呈快速增长趋势。这些数据规模庞大且高通量,对传统统计学方法提出了挑战,需要进行多次检验。目前已经开发了多种多重检验程序,如 Bonferroni 程序、Benjamini-Hochberg(BH)程序和 Westfall-Young 程序等,其中一些控制总体错误率,另一些控制假发现率(FDR)。这些程序在某些情况下是有效的,但不能应用于所有类型的大规模数据。为了解决分析“组学”数据时的这一统计学难题,我们提出了一种生成多重检验程序集的通用方法。该方法基于 BH 定理,通过选择 C 值,可以实现特定的多重检验程序。例如,当 C = 1.22 时,我们的方法产生 BH 程序;当 C < 1.22 时,我们的方法生成弱控制 FDR 的程序;当 C > 1.22 时,程序会强控制 FDR。当 C = G(基因或检验的数量)和 C = 0 时,分别为 Bonferroni 程序和单检验程序,它们是该家族中的两个极端程序。为了让人们在实际中选择合适的多重检验程序,我们开发了一种算法,可以正确、可靠地估计 FDR。模拟结果表明,在各种情况下,我们的方法都能很好地估计 FDR,并且我们通过三个真实数据集说明了该方法的应用。

可用性和实现

我们的程序是用 Matlab 编写的,如有需要可以提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验