Liu X, Sivaganesan S, Yeung K Y, Guo J, Bumgarner R E, Medvedovic Mario
Department of Environmental Health, University of Cincinnati, 3223 Eden Avenue ML 56, Cincinnati, OH 45267, USA.
Bioinformatics. 2006 Jul 15;22(14):1737-44. doi: 10.1093/bioinformatics/btl184. Epub 2006 May 18.
Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements.
We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters.
The open-source package gimm is available at http://eh3.uc.edu/gimm.
通过监测基因在各种实验条件下的表达来识别共调控基因群是复杂的,因为这种共调控是条件特异性的。由于非信息性测量引入的额外“噪声”,忽略共调控的上下文特异性会显著降低聚类程序检测共表达基因的能力。
我们开发了一种新颖的贝叶斯层次模型和相应的计算算法,用于跨不同实验条件和研究对基因表达谱进行聚类,该模型考虑了基因表达模式的上下文特异性。该模型基于贝叶斯无限混合框架,不需要事先指定聚类的数量。通过检查微阵列数据中聚类的特异性和敏感性,我们证明了对上下文特异性的显式建模提高了聚类分析的准确性。我们还证明,从聚类的后验分布得出的共表达概率是对所创建聚类统计显著性的有效估计。
开源软件包gimm可在http://eh3.uc.edu/gimm获得。