Suppr超能文献

DiscoSnp-RAD:用于RAD-Seq群体基因组学的小变异体从头检测

DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics.

作者信息

Gauthier Jérémy, Mouden Charlotte, Suchan Tomasz, Alvarez Nadir, Arrigo Nils, Riou Chloé, Lemaitre Claire, Peterlongo Pierre

机构信息

Univ. Rennes, Inria, CNRS, IRISA, Rennes, France.

W. Szafer Institute of Botany, Polish Academy of Sciences, Krakow, Poland.

出版信息

PeerJ. 2020 Jun 10;8:e9291. doi: 10.7717/peerj.9291. eCollection 2020.

Abstract

Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism-SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, such as or , are based on all-vs-all read alignments, which require consequent time and computing resources. We present an original method, DiscoSnp-RAD, that avoids this pitfall since variants are detected by exploiting specific parts of the assembly graph built from the reads, hence preventing all-vs-all read alignments. We tested the implementation on simulated datasets of increasing size, up to 1,000 samples, and on real RAD-Seq data from 259 specimens of flies, morphologically assigned to seven species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within-species genetic structure linked to the geographic distribution. Furthermore, our results show that DiscoSnp-RAD is significantly faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD-Seq data, it does not require time-consuming parameterization steps and it stands out from other tools due to its completely different principle, making it substantially faster, in particular on large datasets.

摘要

限制性酶切位点关联DNA测序(RAD-Seq)是一种通过对基因组上特定位点进行测序的技术,因其能够以较低成本利用整个群体的变异(主要是单核苷酸多态性-SNP)信息,而在进化生物学领域得到广泛应用。常见的RAD专用工具,如 或 ,基于全对全读段比对,这需要大量的时间和计算资源。我们提出了一种原创方法DiscoSnp-RAD,该方法避免了这一缺陷,因为变异是通过利用从读段构建的组装图的特定部分来检测的,从而避免了全对全读段比对。我们在不断增大规模直至1000个样本的模拟数据集以及来自259个形态学上归为7个物种的果蝇样本的真实RAD-Seq数据上测试了该实现方法。使用STRUCTURE和最大似然系统发育重建方法,所有个体都成功地被归为其所属物种。此外,鉴定出的变异成功揭示了与地理分布相关的种内遗传结构。此外,我们的结果表明DiscoSnp-RAD比现有工具显著更快。总体结果表明,DiscoSnp-RAD适用于从RAD-Seq数据中鉴定变异,不需要耗时的参数化步骤,并且由于其完全不同的原理而在其他工具中脱颖而出,使其速度大幅提升,尤其是在大型数据集上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca43/7293188/d6cb87ffc5ad/peerj-08-9291-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验