Suppr超能文献

在估计同源多倍体等位基因频率时考虑基因型不确定性。

Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids.

机构信息

Department of Evolution, Ecology and Organismal Biology, Ohio State University, 318 W. 12th Avenue, Columbus, OH, 43210, USA.

Department of Statistics, Ohio State University, 1958 Neil Avenue, Columbus, OH, 43210, USA.

出版信息

Mol Ecol Resour. 2016 May;16(3):742-54. doi: 10.1111/1755-0998.12493. Epub 2015 Dec 21.

Abstract

Despite the increasing opportunity to collect large-scale data sets for population genomic analyses, the use of high-throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty-ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high-throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.

摘要

尽管越来越有机会收集大规模的群体基因组分析数据集,但高通量测序在研究多倍体群体方面的应用却很少。这在很大程度上是由于确定多倍体个体基因型中等位基因拷贝数(等位基因剂量不确定性-ADU)的问题,这使得诸如等位基因频率等重要数量的计算变得复杂。在这里,我们描述了一种使用高通量测序数据(以读取计数的形式)来估计同源多倍体群体中双等位基因 SNP 频率的统计模型。我们使用分层贝叶斯模型在统一的推理框架中弥合了从数据收集(使用基于限制酶的技术[例如 GBS,RADseq])到等位基因频率估计的差距,以对基因型不确定性进行求和。根据各种条件为四倍体、六倍体和八倍体群体生成模拟数据集,以评估模型的性能并帮助指导经验数据的收集。我们还在 R 包 polyfreqs 中实现了我们的模型,并通过两个示例分析演示了其用途,这些分析分别研究了(i)预期和观察到的杂合度水平和(ii)模型的适当性。我们的模拟表明,从群体中采样的个体数量比对测序覆盖度的影响更大。示例分析还表明,我们的模型和软件可以用于对同源多倍体进行除等位基因频率估计之外的推断,从而提供对模型适当性的评估和杂合度的估计。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验