Berezikov Eugene, Guryev Victor, Plasterk Ronald H A, Cuppen Edwin
Hubrecht Laboratory, Netherlands Institute for Developmental Biology, 3584 CT, Utrecht, The Netherlands.
Genome Res. 2004 Jan;14(1):170-8. doi: 10.1101/gr.1642804. Epub 2003 Dec 12.
Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for established algorithms, especially when aligning more divergent species. Here, we report a novel phylogenetic footprinting approach, CONREAL, that uses biologically relevant information, that is, potential transcription-factor binding sites as represented by positional weight matrices, to establish anchors between orthologous sequences and to guide promoter sequence alignment. Comparison of the performance of CONREAL with the global alignment programs LAGAN and AVID using a reference data set, shows that CONREAL performs equally well for closely related species like rodents and human, and has a clear added value for aligning promoter elements of more divergent species like human and fish, as it identifies conserved transcription-factor binding sites that are not found by other methods. CONREAL is accessible via a Web interface at http://conreal.niob.knaw.nl/.
由于目标序列长度较短且具有简并性,预测启动子中转录因子的靶位点仍然很困难。尽管使用直系同源序列和系统发育足迹法可能有助于识别保守的和潜在的功能序列,但对于既定算法而言,短转录因子结合位点的正确比对可能存在问题,尤其是在比对分歧较大的物种时。在此,我们报告了一种新的系统发育足迹法CONREAL,它利用生物学相关信息,即由位置权重矩阵表示的潜在转录因子结合位点,在直系同源序列之间建立锚点并指导启动子序列比对。使用参考数据集将CONREAL与全局比对程序LAGAN和AVID的性能进行比较,结果表明,CONREAL在啮齿动物和人类等亲缘关系较近的物种中表现同样出色,并且在比对人类和鱼类等分歧较大物种的启动子元件时具有明显的附加价值,因为它能识别出其他方法未发现的保守转录因子结合位点。可通过网页界面http://conreal.niob.knaw.nl/访问CONREAL。