T-Life Research Center, Department of Physics, Fudan University, Shanghai, PR China.
Biol Direct. 2009 Nov 21;4:45; discussion 45. doi: 10.1186/1745-6150-4-45.
Compactness of highly/broadly expressed genes in human has been explained as selection for efficiency, regional mutation biases or genomic design. However, highly expressed genes in flowering plants were shown to be less compact than lowly expressed ones. On the other hand, opposite facts have also been documented that pollen-expressed Arabidopsis genes tend to contain shorter introns and highly expressed moss genes are compact. This issue is important because it provides a chance to compare the selectionism and the neutralism views about genome evolution. Furthermore, this issue also helps to understand the fates of introns, from the angle of gene expression.
In this study, I used expression data covering more tissues and employ new analytical methods to reexamine the correlations between gene expression and gene structure for two flowering plants, Arabidopsis thaliana and Oryza sativa. It is shown that, different aspects of expression pattern correlate with different parts of gene sequences in distinct ways. In detail, expression level is significantly negatively correlated with gene size, especially the size of non-coding regions, whereas expression breadth correlates with non-coding structural parameters positively and with coding region parameters negatively. Furthermore, the relationships between expression level and structural parameters seem to be non-linear, with the extremes of structural parameters possibly scale as power-laws or logrithmic functions of expression levels.
In plants, highly expressed genes are compact, especially in the non-coding regions. Broadly expressed genes tend to contain longer non-coding sequences, which may be necessary for complex regulations. In combination with previous studies about other plants and about animals, some common scenarios about the correlation between gene expression and gene structure begin to emerge. Based on the functional relationships between extreme values of structural characteristics and expression level, an effort was made to evaluate the relative effectiveness of the energy-cost hypothesis and the time-cost hypothesis.
人类中高度/广泛表达的基因的紧凑性被解释为效率选择、区域突变偏向或基因组设计。然而,开花植物中的高度表达基因比低度表达基因的紧凑性要低。另一方面,也有相反的事实记录,即花粉表达的拟南芥基因往往含有较短的内含子,而高度表达的苔藓基因则是紧凑的。这个问题很重要,因为它提供了一个机会来比较基因组进化的选择论和中性论观点。此外,这个问题还有助于从基因表达的角度了解内含子的命运。
在这项研究中,我使用了涵盖更多组织的表达数据,并采用新的分析方法,重新检查了两个开花植物,拟南芥和水稻,基因表达与基因结构之间的相关性。结果表明,不同的表达模式与基因序列的不同部分以不同的方式相关。具体而言,表达水平与基因大小,特别是非编码区的大小显著负相关,而表达广度与非编码结构参数呈正相关,与编码区参数呈负相关。此外,表达水平和结构参数之间的关系似乎是非线性的,结构参数的极值可能以幂律或对数函数的形式随表达水平变化。
在植物中,高度表达的基因是紧凑的,特别是在非编码区。广泛表达的基因往往含有更长的非编码序列,这可能是复杂调控所必需的。结合之前关于其他植物和动物的研究,基因表达和基因结构之间的相关性开始出现一些共同的情况。基于结构特征极值与表达水平之间的功能关系,我们努力评估能量成本假说和时间成本假说的相对有效性。