Suppr超能文献

在存在错误的 DNA 序列的情况下估计群体遗传参数和比较模型拟合优度。

Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error.

机构信息

Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA.

出版信息

Genome Res. 2010 Jan;20(1):101-9. doi: 10.1101/gr.097543.109. Epub 2009 Dec 1.

Abstract

It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate theta = 4N(e)micro, population exponential growth rate R, and error rate epsilon, simultaneously. Using simulation, we show the combined effects of the parameters, theta, n, epsilon, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of theta with other theta estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals.

摘要

据了解,测序错误会影响进化或群体遗传参数的估计。在深度重测序研究中,由于样本量 n 较大,每个核苷酸位点出错的概率更高,因此这个问题更为突出。我们提出了一种新的方法,基于观察到的 SNP 构型的复合似然,同时推断群体突变率 theta = 4N(e)micro、群体指数增长率 R 和错误率 epsilon。通过模拟,我们展示了参数 theta、n、epsilon 和 R 的综合效应,以及它们对参数估计准确性的影响。我们比较了我们的最大复合似然估计器(MCLE)与其他考虑错误的 theta 估计器。结果表明,当样本量较大或错误率较高时,MCLE 表现良好。通过参数 bootstrap,复合似然也可以用作检验观测 DNA 序列模型拟合优度的统计量。MCLE 方法应用于 1832 名非裔美国人和 1045 名欧洲裔美国人的 ANGPTL4 基因序列数据。

相似文献

引用本文的文献

5
Population genetic studies in the genomic sequencing era.基因组测序时代的群体遗传学研究。
Dongwuxue Yanjiu. 2015 Jul 18;36(4):223-32. doi: 10.13918/j.issn.2095-8137.2015.4.223.

本文引用的文献

6
Population genetic inference from resequencing data.基于重测序数据的群体遗传推断。
Genetics. 2009 Jan;181(1):187-97. doi: 10.1534/genetics.107.080630. Epub 2008 Nov 3.
7
Next-generation DNA sequencing.下一代DNA测序
Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.
9
Testing for neutrality in samples with sequencing errors.检测存在测序错误的样本中的中性
Genetics. 2008 Jul;179(3):1409-24. doi: 10.1534/genetics.107.082198. Epub 2008 Jun 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验