基因组组装比对工具的比较评估

A comparative evaluation of genome assembly reconciliation tools.

作者信息

Alhakami Hind, Mirebrahim Hamid, Lonardi Stefano

机构信息

Department of Computer Science & Engineering, University of California, 900 University Avenue, Riverside, 92521, CA, USA.

出版信息

Genome Biol. 2017 May 18;18(1):93. doi: 10.1186/s13059-017-1213-3.

DOI:10.1186/s13059-017-1213-3

PMID:28521789

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5436433/

Abstract

BACKGROUND

The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation.

RESULTS

Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input.

CONCLUSIONS

None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly.

摘要

背景

由于组装算法上的挑战，大多数真核生物基因组尚未完成。有多种组装和支架搭建工具可供使用，但对于特定的基因组大小和复杂度，使用哪种工具或参数并不总是显而易见的。因此，常见的做法是使用不同的组装器和参数生成多个组装结果，然后选择最佳的一个用于公开发布。一种更具吸引力的方法是将多个组装结果合并，以生成更高质量的一致性组装结果，这就是组装结果协调的目标。

结果

文献中已经提出了几种组装结果协调工具，但它们的优缺点从未在一个通用数据集上进行过比较。我们通过这项工作满足了这一需求，在其中我们报告了对几种工具的广泛比较评估。具体而言，我们评估了合并后的组装结果与作为输入提供的各个组装结果相比的连续性、正确性、覆盖率和重复率。

结论

我们测试的工具中没有一个能始终如一地提高输入的GAGE和合成组装结果的质量。我们的实验表明，当原始组装结果已经具有高质量时，一致性组装结果的连续性会增加。在正确性方面，结果的质量取决于特定的工具，以及输入组装结果的质量和排名。一般来说，错误组装的数量范围从与输入组装结果中最好的相当到与输入组装结果中最差的相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4103/5436433/4ab2aadb0df5/13059_2017_1213_Fig1_HTML.jpg

相似文献

A comparative evaluation of genome assembly reconciliation tools.基因组组装比对工具的比较评估

Genome Biol. 2017 May 18;18(1):93. doi: 10.1186/s13059-017-1213-3.

Novo&Stitch: accurate reconciliation of genome assemblies via optical maps.Novo&Stitch：通过光学图谱实现基因组组装的精确比对。

Bioinformatics. 2018 Jul 1;34(13):i43-i51. doi: 10.1093/bioinformatics/bty255.

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.LR_Gapcloser：一种基于平铺路径的缺口闭合器，它使用长读长来完成基因组组装。

Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.

CAMSA: a tool for comparative analysis and merging of scaffold assemblies.CAMSA：一种用于支架组件比较分析和合并的工具。

BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):496. doi: 10.1186/s12859-017-1919-y.

Assembly reconciliation.装配核对

Bioinformatics. 2008 Jan 1;24(1):42-5. doi: 10.1093/bioinformatics/btm542. Epub 2007 Dec 5.

GAM-NGS: genomic assemblies merger for next generation sequencing.GAM-NGS：用于下一代测序的基因组组装合并。

BMC Bioinformatics. 2013;14 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2105-14-S7-S6. Epub 2013 Apr 22.

OMGS: Optical Map-Based Genome Scaffolding.OMGS：基于光学图谱的基因组支架构建

J Comput Biol. 2020 Apr;27(4):519-533. doi: 10.1089/cmb.2019.0310. Epub 2019 Dec 3.

MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification.MAC：基于邻接代数模型和分类的装配合并

Front Genet. 2020 Jan 31;10:1396. doi: 10.3389/fgene.2019.01396. eCollection 2019.

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.

dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies.dnAQET：一种用于计算从头组装质量基准测试综合指标的框架。

BMC Genomics. 2019 Sep 11;20(1):706. doi: 10.1186/s12864-019-6070-x.

引用本文的文献

Investigating the Quantification Capabilities of a Nanopore-Based Sequencing Platform for Food Safety Application via External Standards of Lambda DNA and Lambda Spiked Beef.通过λ噬菌体DNA和λ噬菌体加标牛肉的外部标准研究基于纳米孔测序平台在食品安全应用中的定量能力。

Foods. 2024 Oct 18;13(20):3304. doi: 10.3390/foods13203304.

A deep learning-based method enables the automatic and accurate assembly of chromosome-level genomes.一种基于深度学习的方法可以实现染色体水平基因组的自动、准确组装。

Nucleic Acids Res. 2024 Oct 28;52(19):e92. doi: 10.1093/nar/gkae789.

Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes.低投入 PacBio 测序生成高质量的个体果蝇基因组并阐明突变过程。

Nat Commun. 2024 Jul 5;15(1):5644. doi: 10.1038/s41467-024-49992-6.

Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity.通过组装具有长程连接性的短读长来探索高质量的微生物基因组。

Nat Commun. 2024 May 31;15(1):4631. doi: 10.1038/s41467-024-49060-z.

Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes.对 11 种从头开始的 HiFi 组装器在复杂真核基因组和宏基因组上的综合评估。

Genome Res. 2024 Mar 20;34(2):326-340. doi: 10.1101/gr.278232.123.

First whole-genome sequence and assembly of the Ecuadorian brown-headed spider monkey (Ateles fusciceps fusciceps), a critically endangered species, using Oxford Nanopore Technologies.首次使用牛津纳米孔技术对极度濒危物种厄瓜多尔褐头蜘蛛猴（Ateles fusciceps fusciceps）进行全基因组测序和组装。

G3 (Bethesda). 2024 Mar 6;14(3). doi: 10.1093/g3journal/jkae014.

Substrate Specificity of Biofilms Proximate to Historic Shipwrecks.历史沉船附近生物膜的底物特异性

Microorganisms. 2023 Sep 27;11(10):2416. doi: 10.3390/microorganisms11102416.

Transcriptomic landscape of posterior regeneration in the annelid Platynereis dumerilii.多毛类环节动物 Platynereis dumerilii 后再生的转录组景观。

BMC Genomics. 2023 Oct 2;24(1):583. doi: 10.1186/s12864-023-09602-z.

A scaffolded and annotated reference genome of giant kelp (Macrocystis pyrifera).巨藻（Macrocystis pyrifera）有脚手架和注释的参考基因组。

BMC Genomics. 2023 Sep 13;24(1):543. doi: 10.1186/s12864-023-09658-x.

Long-read genome assemblies for the study of chromosome expansion: Drosophila kikkawai, Drosophila takahashii, Drosophila bipectinata, and Drosophila ananassae.用于研究染色体扩张的长读基因组组装：黑腹果蝇、拟暗果蝇、双斑果绳和拟黑腹果蝇。

G3 (Bethesda). 2023 Sep 30;13(10). doi: 10.1093/g3journal/jkad191.

本文引用的文献

Genome Sequence of Strain TR1 and Comparative Genomics of Family.TR1菌株的基因组序列及该家族的比较基因组学

Front Microbiol. 2017 Feb 20;8:222. doi: 10.3389/fmicb.2017.00222. eCollection 2017.

Draft genome sequence of subterranean clover, a reference for genus Trifolium.地下三叶草基因组序列草图，三叶草属的一个参考基因组。

Sci Rep. 2016 Aug 22;6:30358. doi: 10.1038/srep30358.

Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage.利用适度的长读长覆盖率对后生动物基因组进行连续且准确的从头组装。

Nucleic Acids Res. 2016 Nov 2;44(19):e147. doi: 10.1093/nar/gkw654. Epub 2016 Jul 25.

A response to Lindsey et al. "Wolbachia pipientis should not be split into multiple species: A response to Ramírez-Puebla et al.".对林赛等人《不应将嗜菌胞质菌分为多个物种：对拉米雷斯 - 普埃布拉等人的回应》的回应

Syst Appl Microbiol. 2016 May;39(3):223-225. doi: 10.1016/j.syapm.2016.03.004. Epub 2016 Mar 16.

Draft Genome Sequence of "Acidibacillus ferrooxidans" ITV01, a Novel Acidophilic Firmicute Isolated from a Chalcopyrite Mine Drainage Site in Brazil.嗜酸芽孢杆菌ITV01的基因组序列草图，该菌是从巴西一处黄铜矿排水口分离出的新型嗜酸厚壁菌。

Genome Announc. 2016 Mar 17;4(2):e01748-15. doi: 10.1128/genomeA.01748-15.

Genome assembly and geospatial phylogenomics of the bed bug Cimex lectularius.臭虫（温带臭虫）的基因组组装与地理空间系统发育基因组学

Nat Commun. 2016 Feb 2;7:10164. doi: 10.1038/ncomms10164.

DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.DECIPHER：利用局部序列上下文来改进蛋白质多序列比对。

BMC Bioinformatics. 2015 Oct 6;16:322. doi: 10.1186/s12859-015-0749-z.

Metassembler: merging and optimizing de novo genome assemblies.元组装器：合并和优化从头基因组组装

Genome Biol. 2015 Sep 24;16:207. doi: 10.1186/s13059-015-0764-4.

Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the "Finished" C. elegans Genome.Illumina合成长读长测序即使在“完成的”秀丽隐杆线虫基因组中也能找回缺失的序列。

Sci Rep. 2015 Jun 3;5:10814. doi: 10.1038/srep10814.

Draft Genome Sequence of the Xylella fastidiosa CoDiRO Strain.木质部难养菌CoDiRO菌株的基因组序列草图

Genome Announc. 2015 Feb 12;3(1):e01538-14. doi: 10.1128/genomeA.01538-14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组组装比对工具的比较评估

A comparative evaluation of genome assembly reconciliation tools.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献