Rispe Claude, Legeai Fabrice, Gauthier Jean-Pierre, Tagu Denis
Institut National de la Recherche Agronomique, Domaine de la Motte, Unité Mixte de Recherche 1099 BIO3P, Le Rheu, France.
J Mol Evol. 2007 Oct;65(4):413-24. doi: 10.1007/s00239-007-9023-y. Epub 2007 Oct 10.
The aim of this study was to analyze patterns of nucleotidic composition and codon usage in the pea aphid genome (Acyrthosiphon pisum). A collection of 60,000 expressed sequence tags (ESTs) in the pea aphid has been used to automatically reconstruct 5809 coding sequences (CDSs), based on similarity with known proteins and on coding style recognition. Reconstructions were manually checked for ribosomal proteins, leading to tentatively reconstruct the nea-complete set of this category. Pea aphid coding sequences showed a shift toward AT (especially at the third codon position) compared to drosophila homologues. Genes with a putative high level of expression (ribosomal and other genes with high EST support) remained more GC3-rich and had a distinct codon usage from bulk sequences: they exhibited a preference for C-ending codons and CGT (for arginine), which thus appeared optimal for translation. However, the discrimination was not as strong as in drosophila, suggesting a reduced degree of translational selection. The space of variation in codon usage for A. pisum appeared to be larger than in drosophila, with a substantial fraction of genes that remained GC3-rich. Some of those (in particular some structural proteins) also showed high levels of codon bias and a very strong preference for C-ending codons, which could be explained either by strong translational selection or by other mechanisms. Finally, genomic traces were analyzed to build 206 fragments containing a full CDS, which allowed studying the correlations between GC contents of coding and those of noncoding (flanking and introns) sequences.
本研究的目的是分析豌豆蚜基因组(豌豆蚜)中的核苷酸组成模式和密码子使用情况。基于与已知蛋白质的相似性和编码风格识别,利用豌豆蚜中60000个表达序列标签(EST)的集合自动重建了5809个编码序列(CDS)。对核糖体蛋白的重建进行了人工检查,从而初步重建了这一类别的近乎完整的集合。与果蝇同源物相比,豌豆蚜编码序列向AT方向偏移(尤其是在第三个密码子位置)。具有假定高表达水平的基因(核糖体基因和其他有高EST支持的基因)仍然富含GC3,并且与大量序列有不同的密码子使用情况:它们表现出对以C结尾的密码子和CGT(用于精氨酸)的偏好,因此这些密码子似乎对翻译是最优的。然而,这种区分不如在果蝇中那么强烈,表明翻译选择程度降低。豌豆蚜密码子使用的变异空间似乎比果蝇中的更大,有相当一部分基因仍然富含GC3。其中一些基因(特别是一些结构蛋白)也表现出高水平的密码子偏好,并且对以C结尾的密码子有非常强烈的偏好,这可以通过强烈的翻译选择或其他机制来解释。最后,对基因组痕迹进行了分析,以构建包含完整CDS的206个片段,这使得能够研究编码序列与非编码(侧翼和内含子)序列的GC含量之间的相关性。