Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.
Genome Res. 2010 Mar;20(3):301-10. doi: 10.1101/gr.102210.109. Epub 2010 Jan 12.
Here, we demonstrate how comparative sequence analysis facilitates genome-wide base-pair-level interpretation of individual genetic variation and address two questions of importance for human personal genomics: first, whether an individual's functional variation comes mostly from noncoding or coding polymorphisms; and, second, whether population-specific or globally-present polymorphisms contribute more to functional variation in any given individual. Neither has been definitively answered by analyses of existing variation data because of a focus on coding polymorphisms, ascertainment biases in favor of common variation, and a lack of base-pair-level resolution for identifying functional variants. We resequenced 575 amplicons within 432 individuals at genomic sites enriched for evolutionary constraint and also analyzed variation within three published human genomes. We find that single-site measures of evolutionary constraint derived from mammalian multiple sequence alignments are strongly predictive of reductions in modern-day genetic diversity across a range of annotation categories and across the allele frequency spectrum from rare (<1%) to high frequency (>10% minor allele frequency). Furthermore, we show that putatively functional variation in an individual genome is dominated by polymorphisms that do not change protein sequence and that originate from our shared ancestral population and commonly segregate in human populations. These observations show that common, noncoding alleles contribute substantially to human phenotypes and that constraint-based analyses will be of value to identify phenotypically relevant variants in individual genomes.
在这里,我们展示了比较序列分析如何促进对个体遗传变异的全基因组碱基对水平的解释,并解决了人类个体基因组学中的两个重要问题:第一,个体的功能变异主要来自非编码还是编码多态性;第二,在任何给定的个体中,是特定人群的多态性还是全球存在的多态性对功能变异的贡献更大。由于对编码多态性的关注、有利于常见变异的确定偏差,以及缺乏碱基对水平的分辨率来识别功能变体,因此对现有变异数据的分析都没有明确回答这两个问题。我们在基因组上富含进化约束的 432 个人的 575 个扩增子中重新测序,并分析了三个已发表的人类基因组中的变异。我们发现,从哺乳动物多序列比对中得出的单一位点进化约束度量指标,与多种注释类别以及从罕见(<1%)到高频(>10%的次要等位基因频率)的等位基因频率范围内的现代遗传多样性减少有很强的相关性。此外,我们表明,个体基因组中的假定功能变异主要由不改变蛋白质序列的多态性组成,这些多态性源自我们共同的祖先群体,并在人类群体中普遍分离。这些观察结果表明,常见的非编码等位基因对人类表型有很大贡献,基于约束的分析将有助于识别个体基因组中表型相关的变体。