Suppr超能文献

从高通量测序数据推断位点频率谱:人类非同义与同义位点选择的定量分析。

Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans.

机构信息

Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK.

出版信息

Genetics. 2011 Aug;188(4):931-40. doi: 10.1534/genetics.111.128355. Epub 2011 May 19.

Abstract

Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.

摘要

在杂合位点的测序读段中,核苷酸类型的测序错误和随机抽样给从高通量测序数据中准确、无偏地推断单核苷酸多态性基因型带来了挑战。在这里,我们开发了一种最大似然方法,用于估计个体样本中等位基因数量的频率分布(即位点频率谱),使用高通量测序数据。我们的方法假设在杂合子中核苷酸类型的二项式抽样和随机测序错误。通过模拟,我们表明如果每个碱基读取的错误率不超过群体核苷酸多样性,则可以获得接近无偏的位点频率谱估计值。我们还表明,如果错误是非随机的,这些估计值是相当稳健的。然后,我们应用该方法推断零倍简并、四倍简并和蛋白质编码基因内含子位点的位点频率谱,使用 1000 基因组计划一期试点产生的低覆盖率人类序列数据。通过拟合一个模型来推断位点频率谱,该模型估计新突变适应度效应分布的参数,我们发现四倍位点存在显著的自然选择证据。我们还发现,一个具有同义位点突变可变效应的模型比一个具有相等突变效应的模型更能显著拟合数据。在可变效应模型下,我们推断出 11%的同义突变受到强烈的纯化选择。

相似文献

引用本文的文献

1
Selection on synonymous sites: the unwanted transcript hypothesis.同义位点选择:不需要的转录本假说。
Nat Rev Genet. 2024 Jun;25(6):431-448. doi: 10.1038/s41576-023-00686-7. Epub 2024 Jan 31.
4
Natural Selection Shapes Codon Usage in the Human Genome.自然选择塑造人类基因组中的密码子使用。
Am J Hum Genet. 2020 Jul 2;107(1):83-95. doi: 10.1016/j.ajhg.2020.05.011. Epub 2020 Jun 8.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验