Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin.
Institut des Sciences de la Forêt Tempérée, Université du Québec en Outaouais, Ripon, Quebec, Canada.
Mol Ecol Resour. 2018 Nov;18(6):1482-1491. doi: 10.1111/1755-0998.12921. Epub 2018 Jul 20.
Reduced-representation genome sequencing such as RADseq aids the analysis of genomes by reducing the quantity of data, thereby lowering both sequencing costs and computational burdens. RADseq was initially designed for studying genetic variation across genomes at the population level, but has also proved to be suitable for interspecific phylogeny reconstruction. RADseq data pose challenges for standard phylogenomic methods, however, due to incomplete coverage of the genome and large amounts of missing data. Alignment-free methods are both efficient and accurate for phylogenetic reconstructions with whole genomes and are especially practical for nonmodel organisms; nonetheless, alignment-free methods have not been applied with reduced genome sequencing data. Here, we test a full-genome assembly- and alignment-free method, AAF, in application to RADseq data and propose two procedures for reads selection to remove reads from restriction sites that were not found in taxa being compared. We validate these methods using both simulations and real data sets. Reads selection improved the accuracy of phylogenetic construction in every simulated scenario and the two real data sets, making AAF as good or better than a comparable alignment-based method, even though AAF had much lower computational burdens. We also investigated the sources of missing data in RADseq and their effects on phylogeny reconstruction using AAF. The AAF pipeline modified for RADseq or other reduced-representation sequencing data, phyloRAD, is available on github (https://github.com/fanhuan/phyloRAD).
简化基因组测序(RADseq)等方法通过减少数据量来辅助基因组分析,从而降低测序成本和计算负担。RADseq 最初是为了在群体水平上研究基因组中的遗传变异而设计的,但也被证明适用于种间系统发育重建。然而,RADseq 数据对标准系统发育基因组学方法提出了挑战,因为它们不能完全覆盖基因组,并且存在大量缺失数据。无比对方法对于全基因组的系统发育重建既高效又准确,对于非模式生物尤其实用;尽管如此,无比对方法尚未应用于简化基因组测序数据。在这里,我们将一种全基因组组装和无比对方法 AAF 应用于 RADseq 数据,并提出了两种用于从未在比较类群中发现的限制位点去除读取的读取选择程序。我们使用模拟和真实数据集验证了这些方法。在每个模拟场景和两个真实数据集的情况下,读取选择都提高了系统发育构建的准确性,AAF 与可比的基于比对的方法一样好或更好,即使 AAF 的计算负担要低得多。我们还研究了 RADseq 中缺失数据的来源及其对使用 AAF 进行系统发育重建的影响。针对 RADseq 或其他简化代表性测序数据修改的 AAF 管道 phyloRAD 可在 github(https://github.com/fanhuan/phyloRAD)上获得。