Suppr超能文献

使用R包poolfstat通过Pool-Seq或等位基因计数数据进行F统计估计和混合图构建。

f-Statistics estimation and admixture graph construction with Pool-Seq or allele count data using the R package poolfstat.

作者信息

Gautier Mathieu, Vitalis Renaud, Flori Laurence, Estoup Arnaud

机构信息

CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France.

SelMet, INRAE, CIRAD, Montpellier SupAgro, Montpellier, France.

出版信息

Mol Ecol Resour. 2022 May;22(4):1394-1416. doi: 10.1111/1755-0998.13557. Epub 2021 Dec 17.

Abstract

By capturing various patterns of the structuring of genetic variation across populations, -statistics have proved highly effective for the inference of demographic history. Such statistics are defined as covariances of SNP allele frequency differences among sets of populations without requiring haplotype information and are hence particularly relevant for the analysis of pooled sequencing (Pool-Seq) data. We here propose a reinterpretation of the (and ) parameters in terms of probability of gene identity and derive from this unified definition unbiased estimators for both Pool-Seq data and standard allele count data obtained from individual genotypes. We implemented these estimators in a new version of the R package poolfstat, which now includes a wide range of inference methods: (i) three-population test of admixture; (ii) four-population test of treeness; (iii) -ratio estimation of admixture rates; and (iv) fitting, visualization and (semi-automatic) construction of admixture graphs. A comprehensive evaluation of the methods implemented in poolfstat on both simulated Pool-Seq (with various sequencing coverages and error rates) and allele count data confirmed the accuracy of these approaches, even for the most cost-effective Pool-Seq design involving relatively low sequencing coverages. We further analysed a real Pool-Seq data made of 14 populations of the invasive species Drosophila suzukii, which allowed refining both the demographic history of native populations and the invasion routes followed by this emblematic pest. Our new package poolfstat provides the community with a user-friendly and efficient all-in-one tool to unravel complex population genetic histories from large-size Pool-Seq or allele count SNP data.

摘要

通过捕捉不同人群中遗传变异结构的各种模式,F统计量已被证明在推断人口历史方面非常有效。此类统计量被定义为种群集合间单核苷酸多态性(SNP)等位基因频率差异的协方差,无需单倍型信息,因此特别适用于分析混合测序(Pool-Seq)数据。我们在此根据基因同一性概率对F(和FST)参数提出了一种重新解释,并由此统一定义推导出针对Pool-Seq数据和从个体基因型获得的标准等位基因计数数据的无偏估计量。我们在R包poolfstat的新版本中实现了这些估计量,该版本现在包括广泛的推断方法:(i)混合的三群体检验;(ii)树形的四群体检验;(iii)混合率的F比率估计;以及(iv)混合图的拟合、可视化和(半自动)构建。对poolfstat中实现的方法在模拟的Pool-Seq(具有各种测序覆盖度和错误率)和等位基因计数数据上进行的全面评估证实了这些方法的准确性,即使对于涉及相对较低测序覆盖度的最具成本效益的Pool-Seq设计也是如此。我们进一步分析了由入侵物种铃木果蝇的14个种群组成的真实Pool-Seq数据,这有助于完善本地种群的人口历史以及这种标志性害虫所遵循的入侵路线。我们的新包poolfstat为社区提供了一个用户友好且高效的一体化工具,用于从大规模Pool-Seq或等位基因计数SNP数据中解开复杂的种群遗传历史。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验