Ruan Jianhua, Dean Angela K, Zhang Weixiong
Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA.
BMC Syst Biol. 2010 Feb 2;4:8. doi: 10.1186/1752-0509-4-8.
Co-expression network-based approaches have become popular in analyzing microarray data, such as for detecting functional gene modules. However, co-expression networks are often constructed by ad hoc methods, and network-based analyses have not been shown to outperform the conventional cluster analyses, partially due to the lack of an unbiased evaluation metric.
Here, we develop a general co-expression network-based approach for analyzing both genes and samples in microarray data. Our approach consists of a simple but robust rank-based network construction method, a parameter-free module discovery algorithm and a novel reference network-based metric for module evaluation. We report some interesting topological properties of rank-based co-expression networks that are very different from that of value-based networks in the literature. Using a large set of synthetic and real microarray data, we demonstrate the superior performance of our approach over several popular existing algorithms. Applications of our approach to yeast, Arabidopsis and human cancer microarray data reveal many interesting modules, including a fatal subtype of lymphoma and a gene module regulating yeast telomere integrity, which were missed by the existing methods.
We demonstrated that our novel approach is very effective in discovering the modular structures in microarray data, both for genes and for samples. As the method is essentially parameter-free, it may be applied to large data sets where the number of clusters is difficult to estimate. The method is also very general and can be applied to other types of data. A MATLAB implementation of our algorithm can be downloaded from http://cs.utsa.edu/~jruan/Software.html.
基于共表达网络的方法在分析微阵列数据(如检测功能基因模块)方面已变得很流行。然而,共表达网络通常通过特定方法构建,且基于网络的分析尚未显示出优于传统聚类分析,部分原因是缺乏无偏评估指标。
在此,我们开发了一种基于共表达网络的通用方法来分析微阵列数据中的基因和样本。我们的方法包括一种简单但稳健的基于秩的网络构建方法、一种无参数的模块发现算法以及一种用于模块评估的基于新颖参考网络的指标。我们报告了基于秩的共表达网络的一些有趣拓扑特性,这些特性与文献中基于值的网络非常不同。使用大量合成和真实的微阵列数据,我们证明了我们的方法优于几种现有的流行算法。我们的方法应用于酵母、拟南芥和人类癌症微阵列数据,揭示了许多有趣的模块,包括一种致命的淋巴瘤亚型和一个调节酵母端粒完整性的基因模块,而这些是现有方法所遗漏的。
我们证明了我们的新方法在发现微阵列数据中基因和样本的模块化结构方面非常有效。由于该方法本质上无参数,它可应用于难以估计聚类数量的大数据集。该方法也非常通用,可应用于其他类型的数据。我们算法的MATLAB实现可从http://cs.utsa.edu/~jruan/Software.html下载。