Hu Zihua, Hu Boyu, Collins James F
New York State Center of Excellence in Bioinformatics and Life Sciences, Department of Biostatistics, Department of Medicine, University at Buffalo, State University of New York (SUNY), Buffalo, NY 14260, USA.
Genome Biol. 2007;8(12):R257. doi: 10.1186/gb-2007-8-12-r257.
Previous methods employed for the identification of synergistic transcription factors (TFs) are based on either TF enrichment from co-regulated genes or phylogenetic footprinting. Despite the success of these methods, both have limitations.
We propose a new strategy to identify synergistic TFs by function conservation. Rather than aligning the regulatory sequences from orthologous genes and then identifying conserved TF binding sites (TFBSs) in the alignment, we developed computational approaches to implement the novel strategy. These methods include combinatorial TFBS enrichment utilizing distance constraints followed by enrichment of overlapping orthologous genes from human and mouse, whose regulatory sequences contain the enriched TFBS combinations. Subsequently, integration of function conservation from both TFBS and overlapping orthologous genes was achieved by correlation analyses. These techniques have been used for genome-wide promoter analyses, which have led to the identification of 51 homotypic TF combinations; the validity of these approaches has been exemplified by both known TF-TF interactions and function coherence analyses. We further provide computational evidence that our novel methods were able to identify synergistic TFs to a much greater extent than phylogenetic footprinting.
Function conservation based on the concordance of combinatorial TFBS enrichment along with enrichment of overlapping orthologous genes has been proven to be a successful means for the identification of synergistic TFs. This approach avoids the limitations of phylogenetic footprinting as it does not depend upon sequence alignment. It utilizes existing gene annotation data, such as those available in GO, thus providing an alternative method for functional TF discovery and annotation.
先前用于识别协同转录因子(TFs)的方法要么基于从共调控基因中富集TF,要么基于系统发育足迹法。尽管这些方法取得了成功,但两者都有局限性。
我们提出了一种通过功能保守性来识别协同TFs的新策略。我们不是对齐直系同源基因的调控序列,然后在比对中识别保守的TF结合位点(TFBSs),而是开发了计算方法来实施这一新策略。这些方法包括利用距离约束进行组合TFBS富集,随后从人和小鼠中富集重叠的直系同源基因,其调控序列包含富集的TFBS组合。随后,通过相关性分析实现了TFBS和重叠直系同源基因两者功能保守性的整合。这些技术已用于全基因组启动子分析,从而识别出51种同型TF组合;已知的TF-TF相互作用和功能一致性分析都例证了这些方法的有效性。我们进一步提供了计算证据,表明我们的新方法比系统发育足迹法能在更大程度上识别协同TFs。
基于组合TFBS富集与重叠直系同源基因富集一致性的功能保守性已被证明是识别协同TFs的一种成功方法。这种方法避免了系统发育足迹法的局限性,因为它不依赖于序列比对。它利用了现有的基因注释数据,如GO中可用的数据,从而为功能性TF的发现和注释提供了一种替代方法。