Suppr超能文献

一种从低覆盖度基因组序列估算杂合度的新方法。

A novel approach to estimating heterozygosity from low-coverage genome sequence.

机构信息

Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115.

出版信息

Genetics. 2013 Oct;195(2):553-61. doi: 10.1534/genetics.113.154500. Epub 2013 Aug 9.

Abstract

High-throughput shotgun sequence data make it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual's genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an individual are limited to low coverage, preventing the confident calling of genotypes necessary to directly count the proportion of heterozygous sites. Here, we present a method for estimating an individual's genome-wide rate of heterozygosity from low-coverage sequence data, without an intermediate step that calls genotypes. Our method jointly learns the shared allele distribution between the individual and a panel of other individuals, together with the sequencing error distributions and the reference bias. We show our method works well, first, by its performance on simulated sequence data and, second, on real sequence data where we obtain estimates using low-coverage data consistent with those from higher coverage. We apply our method to obtain estimates of the rate of heterozygosity for 11 humans from diverse worldwide populations and through this analysis reveal the complex dependency of local sequencing coverage on the true underlying heterozygosity, which complicates the estimation of heterozygosity from sequence data. We show how we can use filters to correct for the confounding arising from sequencing depth. We find in practice that ratios of heterozygosity are more interpretable than absolute estimates and show that we obtain excellent conformity of ratios of heterozygosity with previous estimates from higher-coverage data.

摘要

高通量 shotgun 测序数据原则上使得在没有 SNP 确定偏差混杂的情况下准确估计群体遗传参数成为可能。一个感兴趣的统计量是个体基因组中杂合位点的比例,这对近交和有效种群大小具有指示意义。然而,在许多情况下,个体的可用序列数据仅限于低覆盖度,从而无法自信地调用基因型,无法直接计算杂合位点的比例。在这里,我们提出了一种从低覆盖度序列数据估计个体全基因组杂合率的方法,无需中间步骤调用基因型。我们的方法共同学习个体和其他个体面板之间的共享等位基因分布,以及测序误差分布和参考偏倚。我们通过模拟序列数据的性能首先展示了我们的方法的有效性,其次,通过使用低覆盖度数据获得的与更高覆盖度数据一致的真实序列数据,我们展示了我们的方法的有效性。我们应用我们的方法来获取来自不同世界人群的 11 个人的杂合率估计值,并通过此分析揭示了局部测序覆盖度与真实基础杂合度之间的复杂依赖关系,这使得从序列数据中估计杂合度变得复杂。我们展示了如何使用滤波器来纠正由于测序深度引起的混杂。我们在实践中发现,杂合率的比值比绝对估计值更具可解释性,并表明我们获得的杂合率比值与来自更高覆盖度数据的先前估计值具有极好的一致性。

相似文献

2
Inferring Heterozygosity from Ancient and Low Coverage Genomes.从古代低覆盖度基因组推断杂合性
Genetics. 2017 Jan;205(1):317-332. doi: 10.1534/genetics.116.189985. Epub 2016 Nov 7.
7
SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。
BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

引用本文的文献

本文引用的文献

1
A high-coverage genome sequence from an archaic Denisovan individual.古丹尼索瓦人个体的高覆盖度基因组序列。
Science. 2012 Oct 12;338(6104):222-6. doi: 10.1126/science.1224344. Epub 2012 Aug 30.
6
Fast and flexible simulation of DNA sequence data.DNA序列数据的快速灵活模拟。
Genome Res. 2009 Jan;19(1):136-42. doi: 10.1101/gr.083634.108. Epub 2008 Nov 24.
7
Population genetic inference from resequencing data.基于重测序数据的群体遗传推断。
Genetics. 2009 Jan;181(1):187-97. doi: 10.1534/genetics.107.080630. Epub 2008 Nov 3.
8
Next-generation DNA sequencing.下一代DNA测序
Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验