Ho Sui Shannan J, Mortimer James R, Arenillas David J, Brumm Jochen, Walsh Christopher J, Kennedy Brian P, Wasserman Wyeth W
Centre for Molecular Medicine and Therapeutics, University of British Columbia Vancouver, BC, Canada.
Nucleic Acids Res. 2005 Jun 2;33(10):3154-64. doi: 10.1093/nar/gki624. Print 2005.
Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes.
靶向转录本分析研究能够识别共表达基因集;然而,确定其潜在的功能机制是一项重大挑战。已有的基因注释分析方法,尤其是基于基因本体论的方法,能够识别基因之间的功能联系。类似的用于识别过度富集的转录因子结合位点(TFBSs)的方法在酵母中已取得成功,但在人类基因组学中的扩展大多被证明是无效的。创建一个用于高效识别共表达人类基因子集中共同调控机制的系统有望突破功能基因组学研究中的一个障碍。我们开发了一个综合系统,用于搜索一个或多个转录因子(TFs)共同调控的证据。oPOSSUM将人类和小鼠启动子中保守TFBSs的预计算数据库与用于识别共表达基因集中过度富集位点的统计方法相结合。该算法成功地在组织特异性基因的对照组以及三项转录本分析研究中的共表达基因集中识别出了介导TFs。模拟研究表明,使用经验定义的阈值时,oPOSSUM产生的假阳性很少,并且在共表达基因集中能够容忍高达50%的噪声。