Newberg Lee A, Thompson William A, Conlan Sean, Smith Thomas M, McCue Lee Ann, Lawrence Charles E
The Wadsworth Center, New York State Department of Health, Albany, NY 12201, USA.
Bioinformatics. 2007 Jul 15;23(14):1718-27. doi: 10.1093/bioinformatics/btm241. Epub 2007 May 8.
Identification of functionally conserved regulatory elements in sequence data from closely related organisms is becoming feasible, due to the rapid growth of public sequence databases. Closely related organisms are most likely to have common regulatory motifs; however, the recent speciation of such organisms results in the high degree of correlation in their genome sequences, confounding the detection of functional elements. Additionally, alignment algorithms that use optimization techniques are limited to the detection of a single alignment that may not be representative. Comparative-genomics studies must be able to address the phylogenetic correlation in the data and efficiently explore the alignment space, in order to make specific and biologically relevant predictions.
We describe here a Gibbs sampler that employs a full phylogenetic model and reports an ensemble centroid solution. We describe regulatory motif detection using both simulated and real data, and demonstrate that this approach achieves improved specificity, sensitivity, and positive predictive value over non-phylogenetic algorithms, and over phylogenetic algorithms that report a maximum likelihood solution.
The software is freely available at http://bayesweb.wadsworth.org/gibbs/gibbs.html.
Supplementary data are available at Bioinformatics online.
由于公共序列数据库的快速增长,在亲缘关系密切的生物体的序列数据中识别功能保守的调控元件变得可行。亲缘关系密切的生物体很可能具有共同的调控基序;然而,这些生物体最近的物种形成导致它们的基因组序列具有高度相关性,这使得功能元件的检测变得复杂。此外,使用优化技术的比对算法仅限于检测可能不具代表性的单一比对。比较基因组学研究必须能够解决数据中的系统发育相关性,并有效地探索比对空间,以便做出具体且与生物学相关的预测。
我们在此描述一种采用完整系统发育模型并报告总体质心解的吉布斯采样器。我们使用模拟数据和真实数据描述调控基序检测,并证明该方法相对于非系统发育算法以及报告最大似然解的系统发育算法,在特异性、敏感性和阳性预测值方面都有所提高。
该软件可从http://bayesweb.wadsworth.org/gibbs/gibbs.html免费获取。
补充数据可在《生物信息学》在线获取。