对共表达基因对进行高通量筛选，同时控制错误发现率（FDR）和最小可接受强度（MAS）。

High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS).

作者信息

Zhu Dongxiao, Hero Alfred O, Qin Zhaohui S, Swaroop Anand

机构信息

Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

J Comput Biol. 2005 Sep;12(7):1029-45. doi: 10.1089/cmb.2005.12.1029.

DOI:10.1089/cmb.2005.12.1029

PMID:16201920

Abstract

Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene co-expression. Traditional screening of pairwise co-expression either controls biological significance or statistical significance, but not both. The former approach does not provide stochastic error control, and the later approach screens many co-expressions with excessively low correlation. We have designed and implemented a statistically sound two-stage co-expression detection algorithm that controls both statistical significance (false discovery rate, FDR) and biological significance (minimum acceptable strength, MAS) of the discovered co-expressions. Based on estimation of pairwise gene correlation, the algorithm provides an initial co-expression discovery that controls only FDR, which is then followed by a second stage co-expression discovery which controls both FDR and MAS. It also computes and thresholds the set of FDR p-values for each correlation that satisfied the MAS criterion. Using simulated data, we validated asymptotic null distributions of the Pearson and Kendall correlation coefficients and the two-stage error-control procedure; we also compared our two-stage test procedure with another two-stage test procedure using the receiver operating characteristic (ROC) curve. We then used yeast galactose metabolism data to illustrate the advantage of our method for clustering genes and constructing a relevance network. The method has been implemented in an R package "GeneNT" that is freely available from the Comprehensive R Archive Network (CRAN): www.cran.r-project.org/.

摘要

许多探索性微阵列数据分析工具，如基因聚类和相关性网络，都依赖于检测成对基因的共表达。传统的成对共表达筛选要么控制生物学意义，要么控制统计学意义，但不能同时控制两者。前一种方法无法提供随机误差控制，而后一种方法会筛选出许多相关性极低的共表达。我们设计并实现了一种统计上合理的两阶段共表达检测算法，该算法能同时控制所发现共表达的统计学意义（错误发现率，FDR）和生物学意义（最小可接受强度，MAS）。基于成对基因相关性的估计，该算法首先进行仅控制FDR的初始共表达发现，随后进行同时控制FDR和MAS的第二阶段共表达发现。它还会计算并设定满足MAS标准的每个相关性的FDR p值集合的阈值。我们使用模拟数据验证了Pearson和Kendall相关系数的渐近零分布以及两阶段误差控制程序；我们还使用接收器操作特征（ROC）曲线将我们的两阶段测试程序与另一种两阶段测试程序进行了比较。然后，我们使用酵母半乳糖代谢数据来说明我们的方法在基因聚类和构建相关性网络方面的优势。该方法已在一个名为“GeneNT”的R包中实现，可从综合R存档网络（CRAN）免费获取：www.cran.r-project.org/ 。

相似文献

High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS).

J Comput Biol. 2005 Sep;12(7):1029-45. doi: 10.1089/cmb.2005.12.1029.

Network constrained clustering for gene microarray data.

Bioinformatics. 2005 Nov 1;21(21):4014-20. doi: 10.1093/bioinformatics/bti655. Epub 2005 Sep 1.

Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

BMC Bioinformatics. 2004 Apr 20;5:42. doi: 10.1186/1471-2105-5-42.

Robust estimation of the false discovery rate.

Bioinformatics. 2006 Aug 15;22(16):1979-87. doi: 10.1093/bioinformatics/btl328. Epub 2006 Jun 15.

Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data.

BMC Bioinformatics. 2005 Jul 22;6:187. doi: 10.1186/1471-2105-6-187.

Determination of the differentially expressed genes in microarray experiments using local FDR.

BMC Bioinformatics. 2004 Sep 6;5:125. doi: 10.1186/1471-2105-5-125.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

Empirical Bayes screening of many p-values with applications to microarray studies.

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

A stochastic downhill search algorithm for estimating the local false discovery rate.

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jul-Sep;1(3):98-108. doi: 10.1109/TCBB.2004.24.

Work efficiency: a new criterion for comprehensive comparison and evaluation of statistical methods in large-scale identification of differentially expressed genes.

Genomics. 2011 Nov;98(5):390-9. doi: 10.1016/j.ygeno.2011.05.006. Epub 2011 Jun 30.

引用本文的文献

Metabolomics Applied to Cord Serum in Preeclampsia Newborns: Implications for Neonatal Outcomes.

Front Pediatr. 2022 Apr 25;10:869381. doi: 10.3389/fped.2022.869381. eCollection 2022.

Chemical Constituents and Molecular Mechanism of the Yellow Phenotype of Yellow Mushroom ().

J Fungi (Basel). 2022 Mar 18;8(3):314. doi: 10.3390/jof8030314.

Functional MYB transcription factor gene HtMYB2 is associated with anthocyanin biosynthesis in Helianthus tuberosus L.

BMC Plant Biol. 2020 Jun 1;20(1):247. doi: 10.1186/s12870-020-02463-8.

Variations in Nitrogen Metabolism are Closely Linked with Nitrogen Uptake and Utilization Efficiency in Cotton Genotypes under Various Nitrogen Supplies.

Plants (Basel). 2020 Feb 15;9(2):250. doi: 10.3390/plants9020250.

Physiological and Transcriptomic Changes during the Early Phases of Adventitious Root Formation in Mulberry Stem Hardwood Cuttings.

Int J Mol Sci. 2019 Jul 29;20(15):3707. doi: 10.3390/ijms20153707.

Time-Course Investigation of Small Molecule Metabolites in MAP-Stored Red Blood Cells Using UPLC-QTOF-MS.

Molecules. 2018 Apr 16;23(4):923. doi: 10.3390/molecules23040923.

Large-Scale Multiple Testing of Correlations.

J Am Stat Assoc. 2016;111(513):229-240. doi: 10.1080/01621459.2014.999157. Epub 2016 May 5.

Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

Proc IEEE Inst Electr Electron Eng. 2016 Jan;104(1):93-110. doi: 10.1109/JPROC.2015.2494178. Epub 2015 Dec 21.

Network inference through synergistic subnetwork evolution.

EURASIP J Bioinform Syst Biol. 2015 Nov 27;2015(1):12. doi: 10.1186/s13637-015-0027-4. eCollection 2015 Dec.

A null model for Pearson coexpression networks.

PLoS One. 2015 Jun 1;10(6):e0128115. doi: 10.1371/journal.pone.0128115. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对共表达基因对进行高通量筛选，同时控制错误发现率（FDR）和最小可接受强度（MAS）。

High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献