Suppr超能文献

PhyloGibbs:一种整合了系统发育的吉布斯采样基序查找器。

PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.

作者信息

Siddharthan Rahul, Siggia Eric D, van Nimwegen Erik

机构信息

Center for Studies in Physics and Biology, The Rockefeller University, New York, New York, United States of America.

出版信息

PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.

Abstract

A central problem in the bioinformatics of gene regulation is to find the binding sites for regulatory proteins. One of the most promising approaches toward identifying these short and fuzzy sequence patterns is the comparative analysis of orthologous intergenic regions of related species. This analysis is complicated by various factors. First, one needs to take the phylogenetic relationship between the species into account in order to distinguish conservation that is due to the occurrence of functional sites from spurious conservation that is due to evolutionary proximity. Second, one has to deal with the complexities of multiple alignments of orthologous intergenic regions, and one has to consider the possibility that functional sites may occur outside of conserved segments. Here we present a new motif sampling algorithm, PhyloGibbs, that runs on arbitrary collections of multiple local sequence alignments of orthologous sequences. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors (TFs) can be assigned to the multiple sequence alignments. These binding site configurations are scored by a Bayesian probabilistic model that treats aligned sequences by a model for the evolution of binding sites and "background" intergenic DNA. This model takes the phylogenetic relationship between the species in the alignment explicitly into account. The algorithm uses simulated annealing and Monte Carlo Markov-chain sampling to rigorously assign posterior probabilities to all the binding sites that it reports. In tests on synthetic data and real data from five Saccharomyces species our algorithm performs significantly better than four other motif-finding algorithms, including algorithms that also take phylogeny into account. Our results also show that, in contrast to the other algorithms, PhyloGibbs can make realistic estimates of the reliability of its predictions. Our tests suggest that, running on the five-species multiple alignment of a single gene's upstream region, PhyloGibbs on average recovers over 50% of all binding sites in S. cerevisiae at a specificity of about 50%, and 33% of all binding sites at a specificity of about 85%. We also tested PhyloGibbs on collections of multiple alignments of intergenic regions that were recently annotated, based on ChIP-on-chip data, to contain binding sites for the same TF. We compared PhyloGibbs's results with the previous analysis of these data using six other motif-finding algorithms. For 16 of 21 TFs for which all other motif-finding methods failed to find a significant motif, PhyloGibbs did recover a motif that matches the literature consensus. In 11 cases where there was disagreement in the results we compiled lists of known target genes from the literature, and found that running PhyloGibbs on their regulatory regions yielded a binding motif matching the literature consensus in all but one of the cases. Interestingly, these literature gene lists had little overlap with the targets annotated based on the ChIP-on-chip data. The PhyloGibbs code can be downloaded from http://www.biozentrum.unibas.ch/~nimwegen/cgi-bin/phylogibbs.cgi or http://www.imsc.res.in/~rsidd/phylogibbs. The full set of predicted sites from our tests on yeast are available at http://www.swissregulon.unibas.ch.

摘要

基因调控生物信息学中的一个核心问题是寻找调控蛋白的结合位点。识别这些短而模糊的序列模式最有前景的方法之一是对相关物种的直系同源基因间区域进行比较分析。这种分析因各种因素而变得复杂。首先,为了区分因功能位点出现而导致的保守性与因进化亲缘关系而产生的虚假保守性,需要考虑物种之间的系统发育关系。其次,必须处理直系同源基因间区域的多重比对的复杂性,并且必须考虑功能位点可能出现在保守片段之外的可能性。在此,我们提出一种新的基序采样算法PhyloGibbs,它可在直系同源序列的多个局部序列比对的任意集合上运行。该算法会搜索所有可能的方式,将任意数量转录因子(TF)的任意数量的结合位点分配到多重序列比对中。这些结合位点配置通过贝叶斯概率模型进行评分,该模型通过结合位点和“背景”基因间DNA进化的模型来处理比对序列。该模型明确考虑了比对中物种之间的系统发育关系。该算法使用模拟退火和蒙特卡罗马尔可夫链采样来严格地为其报告的所有结合位点分配后验概率。在对来自五个酿酒酵母物种的合成数据和真实数据进行的测试中,我们的算法表现明显优于其他四种基序查找算法,包括那些也考虑系统发育的算法。我们的结果还表明,与其他算法不同,PhyloGibbs可以对其预测的可靠性做出实际估计。我们的测试表明,在单个基因上游区域的五物种多重比对上运行时,PhyloGibbs平均能以约50%的特异性找回酿酒酵母中所有结合位点的50%以上,以约85%的特异性找回所有结合位点的33%。我们还基于芯片结合位点分析(ChIP-on-chip)数据,对最近注释的包含相同TF结合位点的基因间区域多重比对集合进行了PhyloGibbs测试。我们将PhyloGibbs的结果与使用其他六种基序查找算法对这些数据的先前分析进行了比较。对于21个TF中的16个,其他所有基序查找方法都未能找到显著的基序,而PhyloGibbs确实找回了与文献共识匹配的基序。在11个结果存在分歧的案例中,我们从文献中编制了已知靶基因列表,发现对其调控区域运行PhyloGibbs除一个案例外,在所有案例中都产生了与文献共识匹配的结合基序。有趣的是,这些文献基因列表与基于芯片结合位点分析(ChIP-on-chip)数据注释的靶标几乎没有重叠。PhyloGibbs代码可从http://www.biozentrum.unibas.ch/~nimwegen/cgi-bin/phylogibbs.cgi或http://www.imsc.res.in/~rsidd/phylogibbs下载。我们对酵母测试的完整预测位点集可在http://www.swissregulon.unibas.ch获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82ef/1323456/df8dfffe6e6b/pcbi.0010067.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验