一种用于准确估计差异表达基因识别中错误发现率的通用方法。

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes.

机构信息

College of Life Science, Hunan Normal University, Changsha, Hunan 410087, China and Department of Biostatistics and Epidemiology, Georgia Regents University, Augusta, GA 30912-4900, USA.

出版信息

Bioinformatics. 2014 Jul 15;30(14):2018-25. doi: 10.1093/bioinformatics/btu124. Epub 2014 Mar 14.

DOI:10.1093/bioinformatics/btu124

PMID:24632499

Abstract

UNLABELLED

The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

AVAILABILITY AND IMPLEMENTATION

Our program is implemented in Matlab and is available upon request.

摘要

未加标签

基因组数据、转录组数据、蛋白质组数据和单核苷酸多态性数据等“组学”数据呈快速增长趋势。这些数据规模庞大且高通量，对传统统计学方法提出了挑战，需要进行多次检验。目前已经开发了多种多重检验程序，如 Bonferroni 程序、Benjamini-Hochberg（BH）程序和 Westfall-Young 程序等，其中一些控制总体错误率，另一些控制假发现率（FDR）。这些程序在某些情况下是有效的，但不能应用于所有类型的大规模数据。为了解决分析“组学”数据时的这一统计学难题，我们提出了一种生成多重检验程序集的通用方法。该方法基于 BH 定理，通过选择 C 值，可以实现特定的多重检验程序。例如，当 C = 1.22 时，我们的方法产生 BH 程序；当 C < 1.22 时，我们的方法生成弱控制 FDR 的程序；当 C > 1.22 时，程序会强控制 FDR。当 C = G（基因或检验的数量）和 C = 0 时，分别为 Bonferroni 程序和单检验程序，它们是该家族中的两个极端程序。为了让人们在实际中选择合适的多重检验程序，我们开发了一种算法，可以正确、可靠地估计 FDR。模拟结果表明，在各种情况下，我们的方法都能很好地估计 FDR，并且我们通过三个真实数据集说明了该方法的应用。

可用性和实现

我们的程序是用 Matlab 编写的，如有需要可以提供。

相似文献

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes.

Bioinformatics. 2014 Jul 15;30(14):2018-25. doi: 10.1093/bioinformatics/btu124. Epub 2014 Mar 14.

Work efficiency: a new criterion for comprehensive comparison and evaluation of statistical methods in large-scale identification of differentially expressed genes.

Genomics. 2011 Nov;98(5):390-9. doi: 10.1016/j.ygeno.2011.05.006. Epub 2011 Jun 30.

Identifying differentially expressed genes using false discovery rate controlling procedures.

Bioinformatics. 2003 Feb 12;19(3):368-75. doi: 10.1093/bioinformatics/btf877.

An adaptive single-step FDR procedure with applications to DNA microarray analysis.

Biom J. 2007 Feb;49(1):127-35. doi: 10.1002/bimj.200610316.

A classification approach for DNA methylation profiling with bisulfite next-generation sequencing data.

Bioinformatics. 2014 Jan 15;30(2):172-9. doi: 10.1093/bioinformatics/btt674. Epub 2013 Nov 21.

Quick calculation for sample size while controlling false discovery rate with application to microarray analysis.

Bioinformatics. 2007 Mar 15;23(6):739-46. doi: 10.1093/bioinformatics/btl664. Epub 2007 Jan 19.

Estimation of false discovery proportion under general dependence.

Bioinformatics. 2006 Dec 15;22(24):3025-31. doi: 10.1093/bioinformatics/btl527. Epub 2006 Oct 17.

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

An investigation on performance of Significance Analysis of Microarray (SAM) for the comparisons of several treatments with one control in the presence of small-variance genes.

Biom J. 2008 Oct;50(5):801-23. doi: 10.1002/bimj.200710467.

FDR control by the BH procedure for two-sided correlated tests with implications to gene expression data analysis.

Biom J. 2007 Feb;49(1):107-26. doi: 10.1002/bimj.200510313.

引用本文的文献

Genomic Signatures of Environmental Adaptation in (Fagaceae).

Plants (Basel). 2025 Apr 5;14(7):1128. doi: 10.3390/plants14071128.

The Significant Effects of Threshold Selection for Advancing Nitrogen Use Efficiency in Whole Genome of Bread Wheat.

Plant Direct. 2025 Jan 21;9(1):e70036. doi: 10.1002/pld3.70036. eCollection 2025 Jan.

High intraperitoneal interleukin-6 levels predict ultrafiltration (UF) insufficiency in peritoneal dialysis patients: A prospective cohort study.

Front Med (Lausanne). 2022 Aug 10;9:836861. doi: 10.3389/fmed.2022.836861. eCollection 2022.

Null-free False Discovery Rate Control Using Decoy Permutations.

Acta Math Appl Sin. 2022;38(2):235-253. doi: 10.1007/s10255-022-1077-5. Epub 2022 Apr 9.

COL1A1 Is a Potential Prognostic Biomarker and Correlated with Immune Infiltration in Mesothelioma.

Biomed Res Int. 2021 Jan 4;2021:5320941. doi: 10.1155/2021/5320941. eCollection 2021.

Claudin-6 is a single prognostic marker and functions as a tumor-promoting gene in a subgroup of intestinal type gastric cancer.

Gastric Cancer. 2020 May;23(3):403-417. doi: 10.1007/s10120-019-01014-x. Epub 2019 Oct 25.

lncRNA-ATB functions as a competing endogenous RNA to promote YAP1 by sponging miR-590-5p in malignant melanoma.

Int J Oncol. 2018 Sep;53(3):1094-1104. doi: 10.3892/ijo.2018.4454. Epub 2018 Jun 25.

Retinal metabolic events in preconditioning light stress as revealed by wide-spectrum targeted metabolomics.

Metabolomics. 2017;13(3):22. doi: 10.1007/s11306-016-1156-9. Epub 2017 Jan 20.

Expression analysis of apolipoprotein E and its associated genes in gastric cancer.

Oncol Lett. 2015 Sep;10(3):1309-1314. doi: 10.3892/ol.2015.3447. Epub 2015 Jul 1.

Mantle Branch-Specific RNA Sequences of Moon Scallop Amusium pleuronectes to Identify Shell Color-Associated Genes.

PLoS One. 2015 Oct 23;10(10):e0141390. doi: 10.1371/journal.pone.0141390. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于准确估计差异表达基因识别中错误发现率的通用方法。

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes.

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

未加标签

可用性和实现

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献