Jensen Shane T, Shen Lei, Liu Jun S
Department of Statistics, The Wharton School, University of Pennsylvania, USA.
Bioinformatics. 2005 Oct 15;21(20):3832-9. doi: 10.1093/bioinformatics/bti628. Epub 2005 Aug 16.
We present a sequence-based framework and algorithm PHYLOCLUS for predicting co-regulated genes. In our approach, de novo discovery methods are used to find motifs conserved by evolution and then a Bayesian hierarchical clustering model is used to cluster these motifs, thereby grouping together genes that are putatively co-regulated. Our clustering procedure allows both the number of clusters and the motif width within each cluster to be unknown.
We use our framework to predict co-regulated genes in the bacterium Bacillus subtilis using six other closely related bacterial species. Our predicted motifs and gene clusters are validated using several external sources and significant clusters are examined in detail. An extension to the discovery and clustering of two-block motifs can be used for inference about synergistic binding relationships between transcription factors.
Software and Supplementary Materials can be downloaded at http://stat.wharton.upenn.edu/~stjensen/research/phyloclus.html or http://www.fas.harvard.edu/~junliu/phyloclus.html
我们提出了一个基于序列的框架和算法PHYLOCLUS来预测共调控基因。在我们的方法中,从头发现方法用于寻找进化保守的基序,然后使用贝叶斯层次聚类模型对这些基序进行聚类,从而将假定共调控的基因聚集在一起。我们的聚类过程允许聚类数量和每个聚类中的基序宽度均未知。
我们使用我们的框架,利用其他六个密切相关的细菌物种来预测枯草芽孢杆菌中的共调控基因。我们预测的基序和基因簇通过几个外部来源进行了验证,并对显著的聚类进行了详细检查。对双块基序的发现和聚类的扩展可用于推断转录因子之间的协同结合关系。