Blanco Enrique, Messeguer Xavier, Smith Temple F, Guigó Roderic
Research Group in Biomedical Informatics, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
PLoS Comput Biol. 2006 May;2(5):e49. doi: 10.1371/journal.pcbi.0020049. Epub 2006 May 26.
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels--to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human-mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
我们着手解决比较和表征具有相似表达模式的基因的启动子区域这一问题。在序列分析中,这仍然是一个具有挑战性的问题,因为共表达基因的启动子区域通常不会显示出可识别的序列保守性。因此,在我们的方法中,我们没有直接比较启动子的核苷酸序列。相反,我们获得了转录因子结合位点的预测结果,用相应结合因子的标签对预测位点进行注释,并对齐由此产生的标签序列——我们在此将其称为转录因子图谱(TF-图谱)。为了获得两个TF-图谱的全局成对比对,我们采用了一种最初开发用于比对限制酶图谱的算法。我们在一小部分精心挑选的人鼠直系同源基因对中优化了该算法的参数。这个数据集中以及来自CISRED数据库的一个独立的大得多的数据集中的结果表明,TF-图谱比对能够揭示典型序列比对无法检测到的保守调控元件。