基于未分型群体基因组数据的混合模型推断多群体分歧的 ABC 方法。

ABC inference of multi-population divergence with admixture from unphased population genomic data.

机构信息

Department of Biology, City College of New York, 160 Convent Ave., MR 526, New York, NY, 10031, USA.

出版信息

Mol Ecol. 2014 Sep;23(18):4458-71. doi: 10.1111/mec.12881. Epub 2014 Sep 6.

DOI:10.1111/mec.12881

PMID:25113024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4285295/

Abstract

Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida.

摘要

快速发展的测序技术和成本的降低使得从非模式系统的群体样本中收集基因组规模的数据成为可能。目前，针对这些数据集的历史人口统计学推断工具还不够发达。特别是近似贝叶斯计算（ABC）在生成这些数据的研究人员中尚未得到广泛接受。在这里，我们展示了 ABC 分析现在通过当前基因组测序技术从非模式分类群中获得的大型数据集的前景。我们开发并测试了一个用于模型选择和参数估计的 ABC 框架，给定了三种群分化与混合的历史。然后，我们探索了不同的抽样方案，以说明在这个 ABC 框架中，增加抽样的位点数量、长度或个体数量如何影响模型选择和参数估计的质量。我们的结果表明，随着测序位点数量和/或长度的增加，推断结果得到了显著改善，而通过抽样大量个体获得的收益则较少。根据我们的推理模型，最佳的抽样策略包括每个群体至少从五个二倍体个体中抽样 2000 个左右长度约为 2kb 的位点，尽管具体策略取决于模型和问题。我们通过基于模拟的交叉验证测试了我们的 ABC 方法，并通过使用之前分析过的 Biorhiza pallida（栎瘿蜂）的数据来说明其应用。