Caselle Michele, Di Cunto Ferdinando, Provero Paolo
Dipartimento di Fisica Teorica, Università di Torino, and INFN, Sezione di Torino, Torino, Italy.
BMC Bioinformatics. 2002;3:7. doi: 10.1186/1471-2105-3-7. Epub 2002 Feb 14.
Gene regulation in eukaryotes is mainly effected through transcription factors binding to rather short recognition motifs generally located upstream of the coding region. We present a novel computational method to identify regulatory elements in the upstream region of eukaryotic genes. The genes are grouped in sets sharing an overrepresented short motif in their upstream sequence. For each set, the average expression level from a microarray experiment is determined: If this level is significantly higher or lower than the average taken over the whole genome, then the overerpresented motif shared by the genes in the set is likely to play a role in their regulation.
The method was tested by applying it to the genome of Saccharomyces cerevisiae, using the publicly available results of a DNA microarray experiment, in which expression levels for virtually all the genes were measured during the diauxic shift from fermentation to respiration. Several known motifs were correctly identified, and a new candidate regulatory sequence was determined.
We have described and successfully tested a simple computational method to identify upstream motifs relevant to gene regulation in eukaryotes by studying the statistical correlation between overepresented upstream motifs and gene expression levels.
真核生物中的基因调控主要通过转录因子与通常位于编码区上游的相当短的识别基序结合来实现。我们提出了一种新的计算方法来识别真核基因上游区域的调控元件。基因被分组,每组在其上游序列中共享一个过度出现的短基序。对于每组,确定来自微阵列实验的平均表达水平:如果该水平显著高于或低于整个基因组的平均水平,那么该组基因共享的过度出现的基序可能在它们的调控中起作用。
通过将该方法应用于酿酒酵母基因组进行测试,使用DNA微阵列实验的公开可用结果,其中在从发酵到呼吸的双相转变期间测量了几乎所有基因的表达水平。几个已知基序被正确识别,并且确定了一个新的候选调控序列。
我们已经描述并成功测试了一种简单的计算方法,通过研究过度出现的上游基序与基因表达水平之间的统计相关性来识别与真核生物基因调控相关的上游基序。