Coppe Alessandro, Ferrari Francesco, Bisognin Andrea, Danieli Gian Antonio, Ferrari Sergio, Bicciato Silvio, Bortoluzzi Stefania
University of Padova, Department of Biology, Via G. Colombo 3, 35121, Padova, Italy.
Nucleic Acids Res. 2009 Feb;37(2):533-49. doi: 10.1093/nar/gkn948. Epub 2008 Dec 5.
Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method for discovering putative regulatory elements, i.e. over-represented motifs, in a selected set of promoters, as compared with a background model. The combination of transcriptional, structural and functional data allowed the identification of sets of promoters pertaining to groups of genes co-expressed and co-localized in regions of the human genome. The application of motif discovery to 26 groups of genes co-expressed in myeloid cells differentiation and co-localized in the genome showed that there are more over-represented motifs in promoters of co-expressed and co-localized genes than in promoters of simply co-expressed genes (CEG). Motifs, which are similar to the binding sequences of known transcription factors, non-uniformly distributed along promoter sequences and/or occurring in highly co-expressed subset of genes were identified. Co-expressed and co-localized gene sets were grouped in two co-expressed genomic meta-regions, putatively representing functional domains of a high-level expression regulation.
共表达的基因可能受到基于启动子和/或基于位置的相似调控。尽管有关于人类基因表达、位置和功能的数据,但它们的真正整合对计算生物学来说仍然是一个挑战,阻碍了调控机制的识别。我们对髓系细胞中表达的基因进行了基因组位置、功能注释和启动子的综合分析。启动子分析采用了一种新颖的多步骤方法,用于在一组选定的启动子中发现假定的调控元件,即过度富集的基序,并与背景模型进行比较。转录、结构和功能数据的结合使得能够识别与在人类基因组区域中共表达和共定位的基因群组相关的启动子集合。将基序发现应用于在髓系细胞分化中共表达且在基因组中共定位的26组基因,结果表明,与简单共表达基因(CEG)的启动子相比,共表达且共定位基因的启动子中有更多过度富集的基序。已识别出与已知转录因子结合序列相似、沿启动子序列非均匀分布和/或出现在高度共表达基因子集中的基序。共表达且共定位的基因集被归为两个共表达的基因组元区域,推测代表高水平表达调控的功能域。