Li Yu-Long, Xue Dong-Xiu, Zhang Bai-Dong, Liu Jin-Xian
CAS Key Laboratory of Marine Ecology and Environmental Sciences, Institute of Oceanology, Chinese Academy of Sciences, 7 Nanhai Road, Qingdao 266071, Shandong, People's Republic of China.
Laboratory for Marine Ecology and Environmental Science, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266071, People's Republic of China.
R Soc Open Sci. 2018 Feb 28;5(2):171589. doi: 10.1098/rsos.171589. eCollection 2018 Feb.
Restriction site-associated DNA (RAD) sequencing is revolutionizing studies in ecological, evolutionary and conservation genomics. However, the assembly of paired-end RAD reads with random-sheared ends is still challenging, especially for non-model species with high genetic variance. Here, we present an efficient optimized approach with a pipeline software, RADassembler, which makes full use of paired-end RAD reads with random-sheared ends from multiple individuals to assemble RAD contigs. RADassembler integrates the algorithms for choosing the optimal number of mismatches within and across individuals at the clustering stage, and then uses a two-step assembly approach at the assembly stage. RADassembler also uses data reduction and parallelization strategies to promote efficiency. Compared to other tools, both the assembly results based on simulation and real RAD datasets demonstrated that RADassembler could always assemble the appropriate number of contigs with high qualities, and more read pairs were properly mapped to the assembled contigs. This approach provides an optimal tool for dealing with the complexity in the assembly of paired-end RAD reads with random-sheared ends for non-model species in ecological, evolutionary and conservation studies. RADassembler is available at https://github.com/lyl8086/RADscripts.
限制性内切酶位点相关DNA(RAD)测序正在彻底改变生态、进化和保护基因组学的研究。然而,具有随机剪切末端的双末端RAD读段的组装仍然具有挑战性,尤其是对于具有高遗传变异的非模式物种。在这里,我们提出了一种高效的优化方法,并附带一个管道软件RADassembler,它充分利用来自多个个体的具有随机剪切末端的双末端RAD读段来组装RAD重叠群。RADassembler在聚类阶段集成了用于选择个体内部和个体之间最佳错配数的算法,然后在组装阶段使用两步组装方法。RADassembler还使用数据缩减和并行化策略来提高效率。与其他工具相比,基于模拟和真实RAD数据集的组装结果都表明,RADassembler总能高质量地组装出合适数量的重叠群,并且有更多的读段对被正确地映射到组装好的重叠群上。这种方法为生态、进化和保护研究中的非模式物种处理具有随机剪切末端的双末端RAD读段组装中的复杂性提供了一个最佳工具。RADassembler可在https://github.com/lyl8086/RADscripts上获取。