Kohrn Brendan F, Persinger Jessica M, Cruzan Mitchell B
Department of Biology, Portland State University, 1719 SW 10th Avenue, Portland, Oregon 97201 USA.
Appl Plant Sci. 2017 Nov 14;5(11). doi: 10.3732/apps.1700053. eCollection 2017 Nov.
Seed dispersal contributes to gene flow and is responsible for colonization of new sites and range expansion. Sequencing chloroplast haplotypes offers a way to estimate contributions of seed dispersal to population genetic structure and enables studies of population history. Whole-genome sequencing is expensive, but resources can be conserved by pooling samples. Unfortunately, haplotype associations among single-nucleotide polymorphisms (SNPs) are lost in pooled samples, and treating SNP allele frequencies as independent markers provides biased estimates of genetic structure.
We developed sampling methodologies and an application, CallHap, that uses a least-squares algorithm to evaluate the fit between observed and predicted SNP allele frequencies from pooled samples based on haplotype network phylogeny structure, thus enabling pooling for chloroplast sequencing for large-scale studies of chloroplast genomic variation. This method was tested using artificially constructed test networks and pools, and pooled samples of (California goldfields) from southern Oregon, USA.
CallHap reliably recovered network topologies and haplotype frequencies from pooled samples.
The CallHap pipeline allows for the efficient use of resources for estimation of genetic structure for studies using nonrecombining haplotypes such as intraspecific variation in chloroplast, mitochondrial, bacterial, or viral DNA.
种子传播有助于基因流动,并负责新地点的定殖和范围扩展。对叶绿体单倍型进行测序提供了一种估计种子传播对种群遗传结构贡献的方法,并能够开展种群历史研究。全基因组测序成本高昂,但通过合并样本可以节省资源。不幸的是,单核苷酸多态性(SNP)之间的单倍型关联在合并样本中会丢失,将SNP等位基因频率视为独立标记会提供有偏差的遗传结构估计。
我们开发了采样方法和一个名为CallHap的应用程序,该程序使用最小二乘法算法,根据单倍型网络系统发育结构评估合并样本中观察到的和预测的SNP等位基因频率之间的拟合度,从而能够合并样本用于叶绿体测序,以开展叶绿体基因组变异的大规模研究。使用人工构建的测试网络和样本池,以及来自美国俄勒冈州南部的(加利福尼亚金盏花)合并样本对该方法进行了测试。
CallHap能够可靠地从合并样本中恢复网络拓扑结构和单倍型频率。
CallHap流程允许高效利用资源,以估计使用非重组单倍型(如叶绿体、线粒体、细菌或病毒DNA的种内变异)的研究中的遗传结构。