Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
Genome Res. 2011 Oct;21(10):1686-94. doi: 10.1101/gr.121327.111. Epub 2011 Jul 27.
Comparison of protein-coding DNA sequences from diverse primates can provide insight into these species' evolutionary history and uncover the molecular basis for their phenotypic differences. Currently, the number of available primate reference genomes limits these genome-wide comparisons. Here we use targeted capture methods designed for human to sequence the protein-coding regions, or exomes, of four non-human primate species (three Old World monkeys and one New World monkey). Despite average sequence divergence of up to 4% from the human sequence probes, we are able to capture ~96% of coding sequences. Using a combination of mapping and assembly techniques, we generated high-quality full-length coding sequences for each species. Both the number of nucleotide differences and the distribution of insertion and deletion (indel) lengths indicate that the quality of the assembled sequences is very high and exceeds that of most reference genomes. Using this expanded set of primate coding sequences, we performed a genome-wide scan for genes experiencing positive selection and identified a novel class of adaptively evolving genes involved in the conversion of epithelial cells in skin, hair, and nails to keratin. Interestingly, the genes we identify under positive selection also exhibit significantly increased allele frequency differences among human populations, suggesting that they play a role in both recent and long-term adaptation. We also identify several genes that have been lost on specific primate lineages, which illustrate the broad utility of this data set for other evolutionary analyses. These results demonstrate the power of second-generation sequencing in comparative genomics and greatly expand the repertoire of available primate coding sequences.
来自不同灵长类动物的蛋白质编码 DNA 序列的比较,可以深入了解这些物种的进化历史,并揭示它们表型差异的分子基础。目前,可用的灵长类动物参考基因组数量限制了这些全基因组比较。在这里,我们使用专门针对人类设计的靶向捕获方法来对四种非人类灵长类动物(三种旧世界猴和一种新世界猴)的蛋白质编码区(外显子)进行测序。尽管与人类序列探针的平均序列差异高达 4%,但我们仍能够捕获到约 96%的编码序列。通过使用映射和组装技术的组合,我们为每个物种生成了高质量的全长编码序列。核苷酸差异数量和插入缺失(indel)长度的分布都表明组装序列的质量非常高,超过了大多数参考基因组。利用这组扩展的灵长类动物编码序列,我们进行了全基因组范围内的正选择基因扫描,并鉴定出了一类新的适应性进化基因,它们参与了皮肤、毛发和指甲中上皮细胞向角蛋白的转化。有趣的是,我们在正选择下鉴定的基因在人类群体中也表现出显著增加的等位基因频率差异,这表明它们在近期和长期适应中都发挥了作用。我们还鉴定了一些在特定灵长类动物谱系中丢失的基因,这说明了这个数据集在其他进化分析中的广泛用途。这些结果表明了第二代测序在比较基因组学中的强大功能,并极大地扩展了可用的灵长类动物编码序列的范围。