Aerts Stein, Van Loo Peter, Thijs Gert, Moreau Yves, De Moor Bart
Department of Electrical Engineering ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven, Belgium.
Bioinformatics. 2003 Oct;19 Suppl 2:ii5-14. doi: 10.1093/bioinformatics/btg1052.
The transcriptional regulation of a metazoan gene depends on the cooperative action of multiple transcription factors that bind to cis-regulatory modules (CRMs) located in the neighborhood of the gene. By integrating multiple signals, CRMs confer an organism specific spatial and temporal rate of transcription.
Based on the hypothesis that genes that are needed in exactly the same conditions might share similar regulatory switches, we have developed a novel methodology to find CRMs in a set of coexpressed or coregulated genes. The ModuleSearcher algorithm finds for a given gene set the best scoring combination of transcription factor binding sites within a sequence window using an A(*)procedure for tree searching. To keep the level of noise low, we use DNA sequences that are most likely to contain functional cis-regulatory information, namely conserved regions between human and mouse orthologous genes. The ModuleScanner performs genomic searches with a predicted CRM or with a user-defined CRM known from the literature to find possible target genes. The validity of a set of putative targets is checked using Gene Ontology annotations. We demonstrate the use and effectiveness of the ModuleSearcher and ModuleScanner algorithms and test their specificity and sensitivity on semi-artificial data. Next, we search for a module in a cluster of gene expression profiles of human cell cycle genes.
The ModuleSearcher is available as a web service within the TOUCAN workbench for regulatory sequence analysis, which can be downloaded from http://www.esat.kuleuven.ac.be/~dna/BioI.
后生动物基因的转录调控取决于多个转录因子的协同作用,这些转录因子与位于基因附近的顺式调控模块(CRM)结合。通过整合多种信号,CRM赋予生物体特定的时空转录速率。
基于在完全相同条件下所需的基因可能共享相似调控开关的假设,我们开发了一种新方法来在一组共表达或共调控基因中寻找CRM。ModuleSearcher算法使用A*树搜索过程在序列窗口内为给定基因集找到转录因子结合位点的最佳得分组合。为了保持低噪声水平,我们使用最有可能包含功能性顺式调控信息的DNA序列,即人和小鼠直系同源基因之间的保守区域。ModuleScanner使用预测的CRM或文献中已知的用户定义CRM进行基因组搜索,以找到可能的靶基因。使用基因本体注释检查一组假定靶标的有效性。我们展示了ModuleSearcher和ModuleScanner算法的使用和有效性,并在半人工数据上测试了它们的特异性和敏感性。接下来,我们在人类细胞周期基因的基因表达谱簇中搜索一个模块。
ModuleSearcher作为TOUCAN工作台内用于调控序列分析的网络服务提供,可从http://www.esat.kuleuven.ac.be/~dna/BioI下载。