Center for Computational Biology and Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
Mol Biol Evol. 2013 Dec;30(12):2699-708. doi: 10.1093/molbev/mst167. Epub 2013 Sep 26.
Studies of protein evolution have focused on amino acid substitutions with much less systematic analysis on insertion and deletions (indels) in protein coding genes. We hence surveyed 7,500 genes between Drosophila melanogaster and D. simulans, using D. yakuba as an outgroup for this purpose. The evolutionary rate of coding indels is indeed low, at only 3% of that of nonsynonymous substitutions. As coding indels follow a geometric distribution in size and tend to fall in low-complexity regions of proteins, it is unclear whether selection or mutation underlies this low rate. To resolve the issue, we collected genomic sequences from an isogenic African line of D. melanogaster (ZS30) at a high coverage of 70× and analyzed indel polymorphism between ZS30 and the reference genome. In comparing polymorphism and divergence, we found that the divergence to polymorphism ratio (i.e., fixation index) for smaller indels (size ≤ 10 bp) is very similar to that for synonymous changes, suggesting that most of the within-species polymorphism and between-species divergence for indels are selectively neutral. Interestingly, deletions of larger sizes (size ≥ 11 bp and ≤ 30 bp) have a much higher fixation index than synonymous mutations and 44.4% of fixed middle-sized deletions are estimated to be adaptive. To our surprise, this pattern is not found for insertions. Protein indel evolution appear to be in a dynamic flux of neutrally driven expansion (insertions) together with adaptive-driven contraction (deletions), and these observations provide important insights for understanding the fitness of new mutations as well as the evolutionary driving forces for genomic evolution in Drosophila species.
蛋白质进化的研究主要集中在氨基酸替换上,而对蛋白质编码基因中的插入和缺失(indels)的系统分析则较少。为此,我们调查了黑腹果蝇和拟果蝇之间的 7500 个基因,为此目的将黑腹果蝇亚种用作外群。编码 indels 的进化率确实很低,仅为非同义替换的 3%。由于编码 indels 的大小遵循几何分布,并且倾向于落在蛋白质的低复杂度区域,因此不清楚这种低速率是由选择还是突变引起的。为了解决这个问题,我们从一个高度覆盖的黑腹果蝇(ZS30)的同基因非洲系中收集了基因组序列,覆盖率为 70×,并分析了 ZS30 和参考基因组之间的 indel 多态性。在比较多态性和分歧时,我们发现较小 indels(大小≤10 bp)的分歧到多态性比(即固定指数)与同义变化非常相似,这表明大多数种内多态性和种间分歧的 indels 是选择性中性的。有趣的是,较大大小的缺失(大小≥11 bp 且≤30 bp)的固定指数比同义突变高得多,并且估计 44.4%的固定中等大小缺失是适应性的。令我们惊讶的是,这种模式不适用于插入。蛋白质 indel 进化似乎处于中性驱动扩张(插入)与适应性驱动收缩(缺失)的动态变化中,这些观察结果为理解新突变的适应性以及果蝇种系基因组进化的进化驱动力提供了重要的见解。