Suppr超能文献

对共表达基因对进行高通量筛选,同时控制错误发现率(FDR)和最小可接受强度(MAS)。

High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS).

作者信息

Zhu Dongxiao, Hero Alfred O, Qin Zhaohui S, Swaroop Anand

机构信息

Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

J Comput Biol. 2005 Sep;12(7):1029-45. doi: 10.1089/cmb.2005.12.1029.

Abstract

Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene co-expression. Traditional screening of pairwise co-expression either controls biological significance or statistical significance, but not both. The former approach does not provide stochastic error control, and the later approach screens many co-expressions with excessively low correlation. We have designed and implemented a statistically sound two-stage co-expression detection algorithm that controls both statistical significance (false discovery rate, FDR) and biological significance (minimum acceptable strength, MAS) of the discovered co-expressions. Based on estimation of pairwise gene correlation, the algorithm provides an initial co-expression discovery that controls only FDR, which is then followed by a second stage co-expression discovery which controls both FDR and MAS. It also computes and thresholds the set of FDR p-values for each correlation that satisfied the MAS criterion. Using simulated data, we validated asymptotic null distributions of the Pearson and Kendall correlation coefficients and the two-stage error-control procedure; we also compared our two-stage test procedure with another two-stage test procedure using the receiver operating characteristic (ROC) curve. We then used yeast galactose metabolism data to illustrate the advantage of our method for clustering genes and constructing a relevance network. The method has been implemented in an R package "GeneNT" that is freely available from the Comprehensive R Archive Network (CRAN): www.cran.r-project.org/.

摘要

许多探索性微阵列数据分析工具,如基因聚类和相关性网络,都依赖于检测成对基因的共表达。传统的成对共表达筛选要么控制生物学意义,要么控制统计学意义,但不能同时控制两者。前一种方法无法提供随机误差控制,而后一种方法会筛选出许多相关性极低的共表达。我们设计并实现了一种统计上合理的两阶段共表达检测算法,该算法能同时控制所发现共表达的统计学意义(错误发现率,FDR)和生物学意义(最小可接受强度,MAS)。基于成对基因相关性的估计,该算法首先进行仅控制FDR的初始共表达发现,随后进行同时控制FDR和MAS的第二阶段共表达发现。它还会计算并设定满足MAS标准的每个相关性的FDR p值集合的阈值。我们使用模拟数据验证了Pearson和Kendall相关系数的渐近零分布以及两阶段误差控制程序;我们还使用接收器操作特征(ROC)曲线将我们的两阶段测试程序与另一种两阶段测试程序进行了比较。然后,我们使用酵母半乳糖代谢数据来说明我们的方法在基因聚类和构建相关性网络方面的优势。该方法已在一个名为“GeneNT”的R包中实现,可从综合R存档网络(CRAN)免费获取:www.cran.r-project.org/

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验