Chambers E Anne, Tarvin Rebecca D, Santos Juan C, Ron Santiago R, Betancourth-Cundar Mileidy, Hillis David M, Matz Mikhail V, Cannatella David C
Department of Integrative Biology and Biodiversity Center University of Texas at Austin Austin Texas USA.
Department of Environmental Science, Policy, and Management and Museum of Vertebrate Zoology University of California Berkeley Berkeley California USA.
Ecol Evol. 2023 Mar 8;13(3):e9842. doi: 10.1002/ece3.9842. eCollection 2023 Mar.
Restriction-site-associated DNA sequencing (RADseq) has become an accessible way to obtain genome-wide data in the form of single-nucleotide polymorphisms (SNPs) for phylogenetic inference. Nonetheless, how differences in RADseq methods influence phylogenetic estimation is poorly understood because most comparisons have largely relied on conceptual predictions rather than empirical tests. We examine how differences in ddRAD and 2bRAD data influence phylogenetic estimation in two non-model frog groups. We compare the impact of method choice on phylogenetic information, missing data, and allelic dropout, considering different sequencing depths. Given that researchers must balance input (funding, time) with output (amount and quality of data), we also provide comparisons of laboratory effort, computational time, monetary costs, and the repeatability of library preparation and sequencing. Both 2bRAD and ddRAD methods estimated well-supported trees, even at low sequencing depths, and had comparable amounts of missing data, patterns of allelic dropout, and phylogenetic signal. Compared to ddRAD, 2bRAD produced more repeatable datasets, had simpler laboratory protocols, and had an overall faster bioinformatics assembly. However, many fewer parsimony-informative sites per SNP were obtained from 2bRAD data when using native pipelines, highlighting a need for further investigation into the effects of each pipeline on resulting datasets. Our study underscores the importance of comparing RADseq methods, such as expected results and theoretical performance using empirical datasets, before undertaking costly experiments.
限制性内切酶位点相关DNA测序(RADseq)已成为一种可获取全基因组数据的方法,这些数据以单核苷酸多态性(SNP)的形式存在,用于系统发育推断。然而,由于大多数比较很大程度上依赖于概念预测而非实证检验,因此人们对RADseq方法的差异如何影响系统发育估计了解甚少。我们研究了ddRAD和2bRAD数据的差异如何影响两个非模式蛙类群体的系统发育估计。我们比较了方法选择对系统发育信息、缺失数据和等位基因缺失的影响,同时考虑了不同的测序深度。鉴于研究人员必须在投入(资金、时间)和产出(数据量和质量)之间取得平衡,我们还比较了实验室工作量、计算时间、货币成本以及文库制备和测序的可重复性。即使在低测序深度下,2bRAD和ddRAD方法都能估计出支持度良好的树,并且在缺失数据量、等位基因缺失模式和系统发育信号方面具有可比性。与ddRAD相比,2bRAD产生的数据集更具可重复性,实验室方案更简单,生物信息学组装总体更快。然而,使用原生流程时,从2bRAD数据中获得的每个SNP的简约信息位点要少得多,这突出表明需要进一步研究每个流程对所得数据集的影响。我们的研究强调了在进行成本高昂的实验之前,比较RADseq方法(如使用实证数据集的预期结果和理论性能)的重要性。