Racimo Fernando, Schraiber Joshua G
Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America.
PLoS Genet. 2014 Nov 6;10(11):e1004697. doi: 10.1371/journal.pgen.1004697. eCollection 2014 Nov.
Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral or not strongly selected, and we do not rely on fitting the DFE of all new nonsynonymous mutations to a single probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this and other conservation scores to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model on SNP data. Our method serves to approximate the deleterious DFE of mutations that are segregating, regardless of their genomic consequence. We can then compare the proportion of mutations that are negatively selected or neutral across various categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly peaked at neutrality, while the distribution of nonsynonymous polymorphisms has a second peak at [Formula: see text]. Other types of polymorphisms have shapes that fall roughly in between these two. We find that transcriptional start sites, strong CTCF-enriched elements and enhancers are the regulatory categories with the largest proportion of deleterious polymorphisms.
量化有害或中性的多态性突变比例对于我们理解进化、疾病遗传学以及全基因组变异的维持至关重要。在此,我们开发了一种对人类中分离的单核苷酸突变的适合度效应分布(DFE)的近似方法。与先前的方法不同,我们不假设同义突变是中性的或未受到强烈选择,并且我们不依赖于将所有新的非同义突变的DFE拟合到单一概率分布,这在生物学层面上缺乏充分的依据。我们依赖于一种先前开发的方法,该方法利用各种已发表的注释(包括保守性评分、蛋白质有害性估计和调控数据),根据突变受负选择影响的可能性对人类基因组中的所有突变进行评分,并控制突变率。我们使用扩散理论和SNP数据上的泊松随机场模型,通过最大似然法将此保守性评分和其他保守性评分映射到适合度系数尺度。我们的方法用于近似正在分离的突变的有害DFE,无论其基因组后果如何。然后,我们可以比较在包括不同类型调控位点在内的各种类别中受负选择或中性的突变比例。我们观察到基因间多态性的分布在中性处高度峰值化,而非同义多态性的分布在[公式:见原文]处有第二个峰值。其他类型的多态性的形状大致介于这两者之间。我们发现转录起始位点、富含强CTCF的元件和增强子是有害多态性比例最大的调控类别。