Jegga Anil G, Chen Jing, Gowrisankar Sivakumar, Deshmukh Mrunal A, Gudivada RangaChandra, Kong Sue, Kaimal Vivek, Aronow Bruce J
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
Nucleic Acids Res. 2007 Jan;35(Database issue):D116-21. doi: 10.1093/nar/gkl1011. Epub 2006 Dec 18.
Transcriptional cis-regulatory control regions frequently are found within non-coding DNA segments conserved across multi-species gene orthologs. Adopting a systematic gene-centric pipeline approach, we report here the development of a web-accessible database resource--GenomeTraFac (http://genometrafac.cchmc.org)--that allows genome-wide detection and characterization of compositionally similar cis-clusters that occur in gene orthologs between any two genomes for both microRNA genes as well as conventional RNA-encoding genes. Each ortholog gene pair can be scanned to visualize overall conserved sequence regions, and within these, the relative density of conserved cis-element motif clusters form graph peak structures. The results of these analyses can be mined en masse to identify most frequently represented cis-motifs in a list of genes. The system also provides a method for rapid evaluation and visualization of gene model-consistency between orthologs, and facilitates consideration of the potential impact of sequence variation in conserved non-coding regions to impact complex cis-element structures. Using the mouse and human genomes via the NCBI Reference Sequence database and the Sanger Institute miRBase, the system demonstrated the ability to identify validated transcription factor targets within promoter and distal genomic regulatory regions of both conventional and microRNA genes.
转录顺式调控控制区域经常出现在跨多物种基因直系同源物保守的非编码DNA片段中。采用一种系统的以基因为中心的流程方法,我们在此报告了一种可通过网络访问的数据库资源——GenomeTraFac(http://genometrafac.cchmc.org)的开发,该资源允许对在任何两个基因组之间的基因直系同源物中出现的组成相似的顺式簇进行全基因组检测和表征,这些基因包括微小RNA基因以及传统的RNA编码基因。可以扫描每对直系同源基因以可视化整体保守序列区域,并且在这些区域内,保守顺式元件基序簇的相对密度形成图形峰值结构。这些分析结果可以大量挖掘,以识别基因列表中最常见的顺式基序。该系统还提供了一种快速评估和可视化直系同源物之间基因模型一致性的方法,并有助于考虑保守非编码区域中序列变异对复杂顺式元件结构产生影响的潜在可能性。通过NCBI参考序列数据库和桑格研究所的miRBase使用小鼠和人类基因组,该系统展示了在传统基因和微小RNA基因的启动子及远端基因组调控区域内识别经过验证的转录因子靶标的能力。