Plant Phylogenetics and Conservation Group, Center for Integrative Conservation & Yunnan Key Laboratory for Conservation of Tropical Rainforests and Asian Elephants, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla 666303, China; Eastern China Conservation Centre for Wild Endangered Plant Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Eastern China Conservation Centre for Wild Endangered Plant Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China; Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Plant Sci. 2024 Jul;344:112109. doi: 10.1016/j.plantsci.2024.112109. Epub 2024 May 3.
Advances in next-generation sequencing (NGS) have significantly reduced the cost and improved the efficiency of obtaining single nucleotide polymorphism (SNP) markers, particularly through restriction site-associated DNA sequencing (RAD-seq). Meanwhile, the progression in whole genome sequencing has led to the utilization of an increasing number of reference genomes in SNP calling processes. This study utilized RAD-seq data from 242 individuals of Engelhardia roxburghiana, a tropical tree of the walnut family (Juglandaceae), with SNP calling conducted using the STACKS pipeline. We aimed to compare both reference-based approaches, namely, employing a closely related species as the reference genome versus the species itself as the reference genome, to evaluate their respective merits and limitations. Our findings indicate a substantial discrepancy in the number of obtained SNPs between using a closely related species as opposed to the species itself as reference genomes, the former yielded approximately an order of magnitude fewer SNPs compared to the latter. While the missing rate of individuals and sites of the final SNPs obtained in the two scenarios showed no significant difference. The results showed that using the reference genome of the species itself tends to be prioritized in RAD-seq studies. However, if this is unavailable, considering closely related genomes is feasible due to their wide applicability and low missing rate as alternatives. This study contributes to enrich the understanding of the impact of SNP acquisition when utilizing different reference genomes.
下一代测序(NGS)技术的进步极大地降低了获得单核苷酸多态性(SNP)标记的成本并提高了效率,尤其是通过限制性位点相关 DNA 测序(RAD-seq)。同时,全基因组测序的进展使得在 SNP 调用过程中越来越多的参考基因组得到了利用。本研究利用了 242 个胡桃科(Juglandaceae)热带树木 Engelhardia roxburghiana 的 RAD-seq 数据,使用 STACKS 管道进行 SNP 调用。我们旨在比较基于参考的两种方法,即使用密切相关的物种作为参考基因组与使用该物种自身作为参考基因组,以评估它们各自的优点和局限性。我们的研究结果表明,使用密切相关的物种作为参考基因组与使用该物种自身作为参考基因组时,获得的 SNP 数量存在显著差异,前者获得的 SNP 数量大约比后者少一个数量级。尽管在两种情况下最终获得的 SNPs 的个体缺失率和位点缺失率没有显著差异。结果表明,在 RAD-seq 研究中,使用该物种自身的参考基因组更倾向于优先考虑。然而,如果没有可用的参考基因组,考虑密切相关的基因组是可行的,因为它们具有广泛的适用性和较低的缺失率。本研究有助于丰富利用不同参考基因组时 SNP 获取的影响的理解。