• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从高通量测序数据推断位点频率谱:人类非同义与同义位点选择的定量分析。

Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans.

机构信息

Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK.

出版信息

Genetics. 2011 Aug;188(4):931-40. doi: 10.1534/genetics.111.128355. Epub 2011 May 19.

DOI:10.1534/genetics.111.128355
PMID:21596896
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3176106/
Abstract

Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.

摘要

在杂合位点的测序读段中,核苷酸类型的测序错误和随机抽样给从高通量测序数据中准确、无偏地推断单核苷酸多态性基因型带来了挑战。在这里,我们开发了一种最大似然方法,用于估计个体样本中等位基因数量的频率分布(即位点频率谱),使用高通量测序数据。我们的方法假设在杂合子中核苷酸类型的二项式抽样和随机测序错误。通过模拟,我们表明如果每个碱基读取的错误率不超过群体核苷酸多样性,则可以获得接近无偏的位点频率谱估计值。我们还表明,如果错误是非随机的,这些估计值是相当稳健的。然后,我们应用该方法推断零倍简并、四倍简并和蛋白质编码基因内含子位点的位点频率谱,使用 1000 基因组计划一期试点产生的低覆盖率人类序列数据。通过拟合一个模型来推断位点频率谱,该模型估计新突变适应度效应分布的参数,我们发现四倍位点存在显著的自然选择证据。我们还发现,一个具有同义位点突变可变效应的模型比一个具有相等突变效应的模型更能显著拟合数据。在可变效应模型下,我们推断出 11%的同义突变受到强烈的纯化选择。

相似文献

1
Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans.从高通量测序数据推断位点频率谱:人类非同义与同义位点选择的定量分析。
Genetics. 2011 Aug;188(4):931-40. doi: 10.1534/genetics.111.128355. Epub 2011 May 19.
2
Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.从低覆盖高通量测序数据中快速准确地估计多维位点频率谱。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac032.
3
A simple method for analyzing exome sequencing data shows distinct levels of nonsynonymous variation for human immune and nervous system genes.一种用于分析外显子组测序数据的简单方法显示了人类免疫和神经系统基因中不同水平的非同义变异。
PLoS One. 2012;7(6):e38087. doi: 10.1371/journal.pone.0038087. Epub 2012 Jun 6.
4
Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences.在深度保守的人类增强子中,纯化选择比在编码序列中更具一致性。
PLoS One. 2014 Jul 25;9(7):e103357. doi: 10.1371/journal.pone.0103357. eCollection 2014.
5
Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples.利用大样本推断新非同义突变选择系数的分布
Genetics. 2017 May;206(1):345-361. doi: 10.1534/genetics.116.197145. Epub 2017 Mar 1.
6
Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data.从低深度下一代测序数据中计算 Tajima's D 和其他中性检验统计量。
BMC Bioinformatics. 2013 Oct 2;14:289. doi: 10.1186/1471-2105-14-289.
7
On transition bias in mitochondrial genes of pocket gophers.关于囊鼠线粒体基因的转换偏差。
J Mol Evol. 1996 Jul;43(1):32-40. doi: 10.1007/BF02352297.
8
Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.利用跨越多个单核苷酸多态性的读取信息,从测序数据中推断单倍型。
Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.
9
Extensive purifying selection acting on synonymous sites in HIV-1 Group M sequences.广泛的纯化选择作用于HIV-1 M组序列中的同义位点。
Virol J. 2008 Dec 23;5:160. doi: 10.1186/1743-422X-5-160.
10
Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates.迈向现实的密码子模型:位点间变异性以及同义与非同义速率的依赖性
Bioinformatics. 2007 Jul 1;23(13):i319-27. doi: 10.1093/bioinformatics/btm176.

引用本文的文献

1
Selection on synonymous sites: the unwanted transcript hypothesis.同义位点选择:不需要的转录本假说。
Nat Rev Genet. 2024 Jun;25(6):431-448. doi: 10.1038/s41576-023-00686-7. Epub 2024 Jan 31.
2
distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data.distAngsd:用于下一代测序数据的快速准确的遗传距离推断。
Mol Biol Evol. 2022 Jun 2;39(6). doi: 10.1093/molbev/msac119.
3
Nonsynonymous Polymorphism Counts in Bacterial Genomes: a Comparative Examination.细菌基因组中的非同义多态性计数:比较研究。
Appl Environ Microbiol. 2020 Dec 17;87(1). doi: 10.1128/AEM.02002-20.
4
Natural Selection Shapes Codon Usage in the Human Genome.自然选择塑造人类基因组中的密码子使用。
Am J Hum Genet. 2020 Jul 2;107(1):83-95. doi: 10.1016/j.ajhg.2020.05.011. Epub 2020 Jun 8.
5
Exonic splice regulation imposes strong selection at synonymous sites.外显子剪接调控在同义位点施加了强烈的选择。
Genome Res. 2018 Oct;28(10):1442-1454. doi: 10.1101/gr.233999.117. Epub 2018 Aug 24.
6
Genomic data reveal a loss of diversity in two species of tuco-tucos (genus Ctenomys) following a volcanic eruption.基因组数据显示,在一次火山爆发后,两种毛丝鼠(Ctenomys 属)的多样性丧失。
Sci Rep. 2017 Nov 24;7(1):16227. doi: 10.1038/s41598-017-16430-1.
7
Estimating the prevalence of functional exonic splice regulatory information.估计功能性外显子剪接调控信息的流行率。
Hum Genet. 2017 Sep;136(9):1059-1078. doi: 10.1007/s00439-017-1798-3. Epub 2017 Apr 12.
8
From next-generation resequencing reads to a high-quality variant data set.从新一代重测序 reads 到高质量变异数据集。
Heredity (Edinb). 2017 Feb;118(2):111-124. doi: 10.1038/hdy.2016.102. Epub 2016 Oct 19.
9
Are Synonymous Sites in Primates and Rodents Functionally Constrained?灵长类动物和啮齿动物中的同义位点是否受到功能限制?
J Mol Evol. 2016 Jan;82(1):51-64. doi: 10.1007/s00239-015-9719-3. Epub 2015 Nov 12.
10
Systematic Mapping of Protein Mutational Space by Prolonged Drift Reveals the Deleterious Effects of Seemingly Neutral Mutations.通过长期漂移对蛋白质突变空间进行系统映射揭示了看似中性突变的有害影响。
PLoS Comput Biol. 2015 Aug 14;11(8):e1004421. doi: 10.1371/journal.pcbi.1004421. eCollection 2015 Aug.

本文引用的文献

1
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.
2
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.从多个二倍体样本的低覆盖测序数据中进行 SNP 检测和基因分型。
Genome Res. 2011 Jun;21(6):952-60. doi: 10.1101/gr.113084.110. Epub 2010 Oct 27.
3
mlRho - a program for estimating the population mutation and recombination rates from shotgun-sequenced diploid genomes.mlRho - 一种用于从散弹测序的二倍体基因组估计群体突变和重组率的程序。
Mol Ecol. 2010 Mar;19 Suppl 1(Suppl 1):277-84. doi: 10.1111/j.1365-294X.2009.04482.x.
4
What can we learn about the distribution of fitness effects of new mutations from DNA sequence data?从 DNA 序列数据中,我们可以了解到新突变的适应度效应分布情况?
Philos Trans R Soc Lond B Biol Sci. 2010 Apr 27;365(1544):1187-93. doi: 10.1098/rstb.2009.0266.
5
The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana.拟南芥自发突变的速率和分子谱。
Science. 2010 Jan 1;327(5961):92-4. doi: 10.1126/science.1180677.
6
Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error.在存在错误的 DNA 序列的情况下估计群体遗传参数和比较模型拟合优度。
Genome Res. 2010 Jan;20(1):101-9. doi: 10.1101/gr.097543.109. Epub 2009 Dec 1.
7
Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes.同源人和鼠类基因组中选择性约束位点和有害突变率的分布。
Mol Biol Evol. 2010 Jan;27(1):177-92. doi: 10.1093/molbev/msp219.
8
Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence.从多态性和分化模式推断作用于人类候选顺式调控区域的进化过程。
PLoS Genet. 2009 Aug;5(8):e1000592. doi: 10.1371/journal.pgen.1000592. Epub 2009 Aug 7.
9
Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines.对三个黑腹果蝇自发突变积累品系的基因组序列分析。
Genome Res. 2009 Jul;19(7):1195-201. doi: 10.1101/gr.091231.109. Epub 2009 May 13.
10
Estimation of allele frequencies from high-coverage genome-sequencing projects.从高覆盖度基因组测序项目中估计等位基因频率。
Genetics. 2009 May;182(1):295-301. doi: 10.1534/genetics.109.100479. Epub 2009 Mar 16.