Sauer Tilman, Shelest Ekaterina, Wingender Edgar
Department of Bioinformatics, UKG, Georg-August-University of Goettingen Goldschmidtstrasse 1, 37077 Goettingen, Germany.
Bioinformatics. 2006 Feb 15;22(4):430-7. doi: 10.1093/bioinformatics/bti819. Epub 2005 Dec 6.
'Phylogenetic footprinting' is a widely applied approach to identify regulatory regions and potential transcription factor binding sites (TFBSs) using alignments of non-coding orthologous regions from two or more organisms. A systematic evaluation of its validity and usability based on known TFBSs is needed to use phylogenetic footprinting most effectively in the identification of unknown TFBSs.
In this paper we use 2678 human, mouse and rat TFBSs from the TRANSFAC database for this evaluation. To ensure the retrieval of correct orthologous sequences, we combine gene annotation and sequence homology searches. Demanding a sequence identity of at least 65% is most effective in discriminating TFBSs from non-functional sequence parts, while different alignment algorithms only have a minor influence on TFBS identification by human-rodent comparisons. With this threshold approximately 72% of the known TFBSs are found conserved, a number which varies significantly between different transcription factors and also depends on the function of the regulated gene. TFBSs for certain transcription factors do not require strict sequence conservation but instead may show a high pattern conservation, limiting somewhat the validity of purely sequence-based phylogenetic footprinting.
“系统发育足迹法”是一种广泛应用的方法,用于通过比对两个或多个生物体的非编码直系同源区域来识别调控区域和潜在的转录因子结合位点(TFBS)。为了在识别未知TFBS时最有效地使用系统发育足迹法,需要基于已知的TFBS对其有效性和实用性进行系统评估。
在本文中,我们使用TRANSFAC数据库中的2678个人类、小鼠和大鼠的TFBS进行此评估。为确保检索到正确的直系同源序列,我们结合了基因注释和序列同源性搜索。要求序列同一性至少为65%在区分TFBS和非功能序列部分方面最为有效,而不同的比对算法对人-鼠比较中TFBS识别的影响较小。以此阈值,约72%的已知TFBS被发现是保守的,这个数字在不同转录因子之间差异显著,并且还取决于受调控基因的功能。某些转录因子的TFBS不需要严格的序列保守性,而是可能表现出高度的模式保守性,这在一定程度上限制了基于序列的系统发育足迹法的有效性。