DNA 序列从头组装软件在双酶切文库中的准确性存在很大差异。

Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software.

机构信息

Program in Ecology, University of Wyoming, Laramie, WY, USA.

Wildlife Genomics and Disease Ecology Laboratory, Department of Veterinary Sciences, University of Wyoming, Laramie, WY, USA.

出版信息

Mol Ecol Resour. 2020 Mar;20(2):360-370. doi: 10.1111/1755-0998.13108. Epub 2019 Nov 25.

DOI:10.1111/1755-0998.13108

PMID:31665547

Abstract

Advances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD-HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.

摘要

DNA 测序技术的进步使得收集非模式生物和大量个体的基因组数据成为可能，通常使用基因组亚区测序方法。这些方法中的几种方法对与内切酶限制位点相关的 DNA 进行测序（各种 RAD 和 GBS 方法）。对于没有参考基因组的分类单元，这些方法依赖于测序文库中片段的从头组装。许多为此应用提供的软件选项最初是为其他类型的组装而开发的，我们不知道它们在简化表示文库中的准确性。为了解决这个重要的知识差距，我们模拟了拟南芥和人类基因组的数据，并比较了六种常用于或有望用于此目的的软件程序（ABySS、CD-HIT、Stacks、Stacks2、Velvet 和 VSEARCH）的从头组装。我们模拟了不同的突变率和突变类型，然后将这六个组装器应用于模拟数据集，改变组装参数。我们发现软件性能在模拟和参数设置中存在很大差异。ABySS 未能恢复任何真正的基因组片段，而 Velvet 和 VSEARCH 在大多数模拟中表现不佳。Stacks 和 Stacks2 可以准确地组装包含 SNP 的模拟，但添加插入和缺失突变会降低它们的性能。CD-HIT 是唯一一种能够一致地恢复大量真实基因组片段的组装器。在这里，我们展示了不同软件程序组装结果的准确性存在显著差异，以及比较来自不同参数设置的组装结果的重要性。

相似文献

Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software.DNA 序列从头组装软件在双酶切文库中的准确性存在很大差异。

Mol Ecol Resour. 2020 Mar;20(2):360-370. doi: 10.1111/1755-0998.13108. Epub 2019 Nov 25.

Stacking up RADSeq assembly programs: From complete hit to completely abysmal.堆叠 RADSeq 组装程序：从完全命中到完全糟糕。

Mol Ecol Resour. 2020 Mar;20(2):357-359. doi: 10.1111/1755-0998.13140. Epub 2020 Feb 20.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致（OLC）方法的最佳性能。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.AlignGraph：一种基于密切相关参考序列指导的二级从头基因组组装算法。

Bioinformatics. 2014 Jun 15;30(12):i319-i328. doi: 10.1093/bioinformatics/btu291.

Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化，及其对多样性计算的影响。

Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.

Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.从头转录组组装程序的综合评估及其对差异基因表达分析的影响。

Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625.

De novo likelihood-based measures for comparing genome assemblies.用于比较基因组组装的基于从头似然性的度量

BMC Res Notes. 2013 Aug 22;6:334. doi: 10.1186/1756-0500-6-334.

ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.ABySS 2.0：使用布隆过滤器对大型基因组进行资源高效组装。

Genome Res. 2017 May;27(5):768-777. doi: 10.1101/gr.214346.116. Epub 2017 Feb 23.

Sequence comparative analysis using networks: software for evaluating de novo transcript assembly from next-generation sequencing.使用网络进行序列比较分析：一种用于评估下一代测序从头转录组装的软件。

Mol Biol Evol. 2013 Aug;30(8):1975-86. doi: 10.1093/molbev/mst087. Epub 2013 May 10.

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning.MetaVelvet-SL：Velvet序列拼接软件向利用监督学习的从头宏基因组序列拼接软件的扩展。

DNA Res. 2015 Feb;22(1):69-77. doi: 10.1093/dnares/dsu041. Epub 2014 Nov 27.

引用本文的文献

Evolution and related pathogenic genes of Pseudodiploöspora longispora on Morchella based on genomic characterization and comparative genomic analysis.基于基因组特征和比较基因组分析的羊肚菌上长拟盘多毛孢的进化及其相关致病基因。

Sci Rep. 2024 Aug 10;14(1):18588. doi: 10.1038/s41598-024-69421-4.

Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data.常用的 Hardy-Weinberg 平衡过滤方案会影响使用 RADseq 数据进行的群体结构推断。

Mol Ecol Resour. 2022 Oct;22(7):2599-2613. doi: 10.1111/1755-0998.13646. Epub 2022 Jun 5.

Recent hybrid speciation at the origin of the narrow endemic Pulmonaria helvetica.最近在窄域特有肺草的起源处发生了混合物种形成。

Ann Bot. 2021 Jan 1;127(1):21-31. doi: 10.1093/aob/mcaa145.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DNA 序列从头组装软件在双酶切文库中的准确性存在很大差异。

Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献