Ren Xin-Ying, Fiers Mark W E J, Stiekema Willem J, Nap Jan-Peter
Applied Bioinformatics, Plant Research International, NL-6700 AA Wageningen, The Netherlands.
Plant Physiol. 2005 Jun;138(2):923-34. doi: 10.1104/pp.104.055673. Epub 2005 May 27.
Expression of genes in eukaryotic genomes is known to cluster, but cluster size is generally loosely defined and highly variable. We have here taken a very strict definition of cluster as sets of physically adjacent genes that are highly coexpressed and form so-called local coexpression domains. The Arabidopsis (Arabidopsis thaliana) genome was analyzed for the presence of such local coexpression domains to elucidate its functional characteristics. We used expression data sets that cover different experimental conditions, organs, tissues, and cells from the Massively Parallel Signature Sequencing repository and microarray data (Affymetrix) from a detailed root analysis. With these expression data, we identified 689 and 1,481 local coexpression domains, respectively, consisting of two to four genes with a pairwise Pearson's correlation coefficient larger than 0.7. This number is approximately 1- to 5-fold higher than the numbers expected by chance. A small (5%-10%) yet significant fraction of genes in the Arabidopsis genome is therefore organized into local coexpression domains. These local coexpression domains were distributed over the genome. Genes in such local domains were for the major part not categorized in the same functional category (GOslim). Neither tandemly duplicated genes nor shared promoter sequence nor gene distance explained the occurrence of coexpression of genes in such chromosomal domains. This indicates that other parameters in genes or gene positions are important to establish coexpression in local domains of Arabidopsis chromosomes.
真核生物基因组中的基因表达已知会聚类,但聚类大小通常定义宽松且高度可变。我们在此采用了非常严格的聚类定义,即将聚类视为物理上相邻且高度共表达并形成所谓局部共表达域的基因集。对拟南芥(Arabidopsis thaliana)基因组进行分析,以确定此类局部共表达域的存在,从而阐明其功能特征。我们使用了涵盖来自大规模平行签名测序库的不同实验条件、器官、组织和细胞的表达数据集,以及来自详细根系分析的微阵列数据(Affymetrix)。利用这些表达数据,我们分别鉴定出689个和1481个局部共表达域,这些域由两个至四个基因组成,两两之间的皮尔逊相关系数大于0.7。这个数字比随机预期的数字高出约1至5倍。因此,拟南芥基因组中一小部分(5% - 10%)但很显著的基因被组织成局部共表达域。这些局部共表达域分布在整个基因组中。此类局部域中的基因大部分并未被归类到相同的功能类别(GOslim)中。串联重复基因、共享启动子序列或基因距离均无法解释此类染色体域中基因共表达的发生。这表明基因或基因位置中的其他参数对于在拟南芥染色体的局部域中建立共表达很重要。