Dipartimento di Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari, Sassari, Italy.
PLoS One. 2011;6(8):e22855. doi: 10.1371/journal.pone.0022855. Epub 2011 Aug 1.
Genomic DNA sequences display compositional heterogeneity on many scales. In this paper we analyzed tendencies and anomalies in the occurence of mono, di and trinucleotides in structural regions of plant genes. Representation of these trends as a function of position along genic sequences highlighted compositional features peculiar of either monocots or eudicots that were remarkably uniform within these two evolutionary clades. The most evident of these features appeared in the form of gradient of base content along the direction of transcription. The robustness of such a representation was validated in sequences sub-datasets generated considering structural and compositional features such as total length of cds, overall GC content and genic orientation in the genome. Piecewise regression analyses indicated that the gradients could be conveniently approximated to a two segmented model where a first region featuring a steep slope is followed by a second segment fitting a milder variation. In general, monocots species showed steeper segments than eudicots. The guanine gradient was the most distinctive feature between the two evolutionary clades, being moderately increasing in eudicots and firmly decreasing in monocots. Single gene investigation revealed that a high proportion of genes show compositional trends compatible with a segmented model suggesting that these features are essential attributes of gene organization. Dinucleotide and trinucleotide biases were referred to expectation based on a random union of the component elements. The average bias at dinucleotide level identified a significant undererpresentation of some dinucleotide and the overrepresention of others. The bias at trinucleotide level was on average low. Finally, the analysis of bryophyte coding sequences showed mononucleotide, dinucleotide and trinucleotide compositional trends resembling those of higher plants. This finding suggested that the emergenge of compositional bias is an ancient event in evolution which was already present at the time of land conquest by green plants.
基因组 DNA 序列在多个尺度上显示出组成异质性。在本文中,我们分析了植物基因结构区域中单、二和三核苷酸出现的趋势和异常。将这些趋势表示为基因序列位置的函数,突出了单子叶植物或真双子叶植物特有的组成特征,这些特征在这两个进化分支中非常一致。这些特征中最明显的是以转录方向的碱基含量梯度的形式出现。在考虑结构和组成特征(如 cds 的总长度、总 GC 含量和基因在基因组中的方向)生成的序列子数据集上验证了这种表示的稳健性。分段回归分析表明,梯度可以方便地近似为两段模型,其中第一个区域具有陡峭的斜率,其次是第二个拟合更温和变化的段。一般来说,单子叶植物的段比真双子叶植物更陡峭。鸟嘌呤梯度是两个进化分支之间最显著的特征,在真双子叶植物中呈中度增加,而在单子叶植物中则稳定下降。单个基因的研究表明,很大一部分基因的组成趋势与分段模型兼容,这表明这些特征是基因组织的基本属性。二核苷酸和三核苷酸偏倚是基于组成元素随机组合的预期来引用的。二核苷酸水平的平均偏差确定了一些二核苷酸的明显代表性不足和其他二核苷酸的过度代表性。三核苷酸水平的偏差平均较低。最后,对苔藓植物编码序列的分析表明,单核苷酸、二核苷酸和三核苷酸的组成趋势类似于高等植物。这一发现表明,组成偏差的出现是进化中的一个古老事件,在绿色植物征服陆地时就已经存在。