Uricchio Lawrence H, Torres Raul, Witte John S, Hernandez Ryan D
Graduate Program in Bioinformatics, University of California, San Francisco, California, United States of America.
Genet Epidemiol. 2015 Jan;39(1):35-44. doi: 10.1002/gepi.21866. Epub 2014 Nov 21.
Demographic events and natural selection alter patterns of genetic variation within populations and may play a substantial role in shaping the genetic architecture of complex phenotypes and disease. However, the joint impact of these basic evolutionary forces is often ignored in the assessment of statistical tests of association. Here, we provide a simulation-based framework for generating DNA sequences that incorporates selection and demography with flexible models for simulating phenotypic variation (sfs_coder). This tool also allows the user to perform locus-specific simulations by automatically querying annotated genomic functional elements and genetic maps. We demonstrate the effects of evolutionary forces on patterns of genetic variation by simulating recently inferred models of human selection and demography. We use these simulations to show that the demographic model and locus-specific features, such as the proportion of sites under selection, may have practical implications for estimating the statistical power of sequencing-based rare variant association tests. In particular, for some phenotype models, there may be higher power to detect rare variant associations in African populations compared to non-Africans, but power is considerably reduced in regions of the genome with rampant negative selection. Furthermore, we show that existing methods for simulating large samples based on resampling from a small set of observed haplotypes fail to recapitulate the distribution of rare variants in the presence of rapid population growth (as has been observed in several human populations).
人口统计学事件和自然选择会改变种群内的遗传变异模式,并且可能在塑造复杂表型和疾病的遗传结构中发挥重要作用。然而,在关联统计检验的评估中,这些基本进化力量的联合影响常常被忽视。在此,我们提供了一个基于模拟的框架来生成DNA序列,该框架将选择和人口统计学与用于模拟表型变异的灵活模型相结合(sfs_coder)。这个工具还允许用户通过自动查询注释的基因组功能元件和遗传图谱来进行位点特异性模拟。我们通过模拟最近推断出的人类选择和人口统计学模型,展示了进化力量对遗传变异模式的影响。我们利用这些模拟结果表明,人口统计学模型和位点特异性特征,例如选择下的位点比例,可能对估计基于测序的罕见变异关联检验的统计效力具有实际意义。特别是,对于某些表型模型,与非非洲人群相比,在非洲人群中检测罕见变异关联可能具有更高的效力,但在基因组中存在大量负选择的区域,效力会大幅降低。此外,我们表明,现有的基于从一小部分观察到的单倍型中重采样来模拟大样本的方法,在存在快速种群增长的情况下(如在几个人类种群中观察到的那样),无法重现罕见变异的分布。