Gao Feng, Keinan Alon
BMC Genomics. 2014;15 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2164-15-S4-S3. Epub 2014 May 20.
Recent studies have shown that human populations have experienced a complex demographic history, including a recent epoch of rapid population growth that led to an excess in the proportion of rare genetic variants in humans today. This excess can impact the burden of private mutations for each individual, defined here as the proportion of heterozygous variants in each newly sequenced individual that are novel compared to another large sample of sequenced individuals.
We calculated the burden of private mutations predicted by different demographic models, and compared with empirical estimates based on data from the NHLBI Exome Sequencing Project and data from the Neutral Regions (NR) dataset. We observed a significant excess in the proportion of private mutations in the empirical data compared with models of demographic history without a recent epoch of population growth. Incorporating recent growth into the model provides a much improved fit to empirical observations. This phenomenon becomes more marked for larger sample sizes, e.g. extrapolating to a scenario in which 10,000 individuals from the same population have been sequenced with perfect accuracy, still about 1 in 400 heterozygous sites (or about 6,000 variants) at the 10,001 st individual are predicted to be novel, 18-times as predicted in the absence of recent population growth. The proportion of private mutations is additionally increased by purifying selection, which differentially affect mutations of different functional annotations.
The burden of private mutations for each individual, which are singletons (i.e. appearing in a single copy) in a larger sample that includes this individual, is predicted to be greatly increased by recent population growth, as well as by purifying selection. Comparison with empirical data supports that European populations have experienced recent rapid population growth, consistent with previous studies. These results have important implications to the design and analysis of sequencing-based association studies of complex human disease as they pertain to private and very rare variants. They also imply that personalized genomics will indeed have to be very personal in accounting for the large number of private mutations.
最近的研究表明,人类群体经历了复杂的人口历史,包括最近一个人口快速增长的时期,这导致了当今人类中罕见遗传变异比例的过剩。这种过剩会影响每个个体的私有突变负担,在此定义为每个新测序个体中与另一个大型测序个体样本相比是新出现的杂合变异的比例。
我们计算了不同人口模型预测的私有突变负担,并与基于美国国立心肺血液研究所外显子测序项目数据和中性区域(NR)数据集数据的实证估计进行了比较。我们观察到,与没有最近人口增长时期的人口历史模型相比,实证数据中私有突变的比例显著过剩。将最近的增长纳入模型可显著改善对实证观察结果的拟合。对于更大的样本量,这种现象变得更加明显,例如推断到一个场景,即来自同一群体的10000个人被完美准确地测序,在第10001个人中,预计仍有约1/400的杂合位点(或约6000个变异)是新出现的,这是在没有最近人口增长的情况下预测值的18倍。纯化选择会额外增加私有突变的比例,纯化选择对不同功能注释的突变有不同影响。
每个个体的私有突变负担,即在包含该个体的更大样本中为单例(即仅出现一次)的突变,预计会因最近的人口增长以及纯化选择而大幅增加。与实证数据的比较支持欧洲群体最近经历了快速人口增长,这与之前的研究一致。这些结果对复杂人类疾病基于测序的关联研究的设计和分析具有重要意义,因为它们涉及私有和非常罕见的变异。它们还意味着个性化基因组学在考虑大量私有突变时确实必须非常个性化。