Karanam Suresh, Moreno Carlos S
Program in Bioinformatics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W475-84. doi: 10.1093/nar/gkh353.
The advent of DNA microarray technology and the sequencing of multiple vertebrate genomes has provided a unique opportunity for the integration of comparative genomics with high-throughput gene expression analysis. Here we describe the conserved transcription factor binding site (CONFAC) software that enables the high-throughput identification of conserved transcription factor binding sites (TFBSs) in the regulatory regions of hundreds of genes at a time (http://morenolab.whitehead.emory.edu/cgi-bin/confac/login.pl). The CONFAC software compares non-coding regulatory sequences between human and mouse genomes to enable identification of conserved TFBSs that are significantly enriched in promoters of gene clusters from microarray analyses compared to sets of unchanging control genes using a Mann-Whitney U-test. Analysis of random gene sets demonstrated that using our approach, over 98% of TFBSs had false positive rates below 5%. As a proof-of-principle, we have validated the CONFAC software using gene sets from four separate microarray studies and identified TFBSs known to be functionally important for regulation of each of the four gene sets.
DNA微阵列技术的出现以及多个脊椎动物基因组的测序为比较基因组学与高通量基因表达分析的整合提供了独特的机会。在此,我们描述了保守转录因子结合位点(CONFAC)软件,该软件能够一次高通量鉴定数百个基因调控区域中的保守转录因子结合位点(TFBSs)(http://morenolab.whitehead.emory.edu/cgi-bin/confac/login.pl)。CONFAC软件比较人类和小鼠基因组之间的非编码调控序列,以鉴定保守的TFBSs,这些TFBSs在微阵列分析的基因簇启动子中比使用曼-惠特尼U检验的不变对照基因集显著富集。对随机基因集的分析表明,使用我们的方法,超过98%的TFBSs假阳性率低于5%。作为原理验证,我们使用来自四项独立微阵列研究的基因集验证了CONFAC软件,并鉴定了已知对这四个基因集中每一个的调控具有功能重要性的TFBSs。