评估Illumina纠错工具对从头基因组组装的影响。

Evaluation of the impact of Illumina error correction tools on de novo genome assembly.

作者信息

Heydari Mahdi, Miclotte Giles, Demeester Piet, Van de Peer Yves, Fostier Jan

机构信息

Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.

Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.

出版信息

BMC Bioinformatics. 2017 Aug 18;18(1):374. doi: 10.1186/s12859-017-1784-8.

DOI:10.1186/s12859-017-1784-8

PMID:28821237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5563063/

Abstract

BACKGROUND

Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods.

RESULTS

For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy.

CONCLUSIONS

We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools.

摘要

背景

最近，人们提出了许多独立应用程序来校正Illumina数据中的测序错误。其关键思想是，诸如从头基因组组装程序等下游分析工具会从输入数据中降低的错误率中受益。令人惊讶的是，即使对于最近发表的方法，也缺乏使用最先进的组装方法对这一假设进行系统验证。

结果

对于十二种近期的Illumina错误校正工具（EC工具），我们评估了它们校正测序错误的能力以及在重叠群大小和准确性方面改善从头基因组组装的能力。

结论

我们证实，大多数EC工具减少了测序数据中的错误数量，且未引入许多新错误。然而，我们发现许多EC工具在某些序列背景下表现不佳，例如低覆盖区域或包含短重复或低复杂性序列的区域。与这些区域重叠的 reads 常常以不一致的方式校正错误，导致最终组装中出现断点，而这些断点在未校正数据得到的组装中并不存在。解决未来EC工具中的这一系统缺陷可以大大提高此类工具的适用性。

相似文献

Evaluation of the impact of Illumina error correction tools on de novo genome assembly.评估Illumina纠错工具对从头基因组组装的影响。

BMC Bioinformatics. 2017 Aug 18;18(1):374. doi: 10.1186/s12859-017-1784-8.

Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。

BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.

An efficient error correction algorithm using FM-index.一种使用FM索引的高效错误校正算法。

BMC Bioinformatics. 2017 Nov 28;18(1):524. doi: 10.1186/s12859-017-1940-1.

QuorUM: An Error Corrector for Illumina Reads.QuorUM：Illumina测序读数的纠错工具

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

Simultaneous compression of multiple error-corrected short-read sets for faster data transmission and better de novo assemblies.同时压缩多个纠错后的短读段，以实现更快的数据传输和更好的从头组装。

Brief Funct Genomics. 2022 Sep 16;21(5):387-398. doi: 10.1093/bfgp/elac016.

Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna：用于配置短读和长读基因组测序错误纠正工具的变压器架构。

BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.

HISEA: HIerarchical SEed Aligner for PacBio data.HISEA：用于PacBio数据的分层种子比对器。

BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

CARE 2.0: reducing false-positive sequencing error corrections using machine learning.CARE 2.0：利用机器学习减少假阳性测序错误纠正。

BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.

Pollux: platform independent error correction of single and mixed genomes.Pollux：单基因组和混合基因组的平台无关错误校正

BMC Bioinformatics. 2015 Jan 16;16(1):10. doi: 10.1186/s12859-014-0435-6.

Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly.长测序读段的迭代纠错可最大化准确性并改善重叠群组装。

Brief Bioinform. 2017 Jan;18(1):1-8. doi: 10.1093/bib/bbw003. Epub 2016 Feb 10.

引用本文的文献

Illumina reads correction: evaluation and improvements.Illumina测序读数校正：评估与改进

Sci Rep. 2024 Jan 26;14(1):2232. doi: 10.1038/s41598-024-52386-9.

An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies.一个被忽视的现象：潜在误差源对细菌从头基因组组装质量的复杂相互作用。

BMC Genomics. 2024 Jan 9;25(1):45. doi: 10.1186/s12864-023-09910-4.

The impact of applying various de novo assembly and correction tools on the identification of genome characterization, drug resistance, and virulence factors of clinical isolates using ONT sequencing.应用不同从头组装和校正工具对基于 ONT 测序的临床分离株基因组特征、耐药性和毒力因子鉴定的影响。

BMC Biotechnol. 2023 Jul 31;23(1):26. doi: 10.1186/s12896-023-00797-3.

SparkEC: speeding up alignment-based DNA error correction tools.SparkEC：加速基于比对的 DNA 纠错工具。

BMC Bioinformatics. 2022 Nov 7;23(1):464. doi: 10.1186/s12859-022-05013-1.

CARE 2.0: reducing false-positive sequencing error corrections using machine learning.CARE 2.0：利用机器学习减少假阳性测序错误纠正。

BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.

Trimming and Validation of Illumina Short Reads Using Trimmomatic, Trinity Assembly, and Assessment of RNA-Seq Data.使用 Trimmomatic、Trinity 组装进行 Illumina 短读段的修剪和验证，以及 RNA-Seq 数据的评估。

Methods Mol Biol. 2022;2443:211-232. doi: 10.1007/978-1-0716-2067-0_11.

BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.

Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.评估肠道病原体暴发的全基因组测序质量指标

PeerJ. 2021 Nov 25;9:e12446. doi: 10.7717/peerj.12446. eCollection 2021.

Evaluation of whole-genome sequence data analysis approaches for short- and long-read sequencing of .评价用于短读长读测序的全基因组序列数据分析方法。

Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000695.

PromethION Sequencing and Assembly of the Genome of Micropoecilia picta, a Fish with a Highly Degenerated Y Chromosome.普罗米修斯测序与组装小鳞鱼的基因组，一种 Y 染色体高度退化的鱼类。

Genome Biol Evol. 2021 Sep 1;13(9). doi: 10.1093/gbe/evab171.

本文引用的文献

BLESS 2: accurate, memory-efficient and fast error correction method.BLESS 2：精确、内存高效且快速的纠错方法。

Bioinformatics. 2016 Aug 1;32(15):2369-71. doi: 10.1093/bioinformatics/btw146. Epub 2016 Mar 24.

EC: an efficient error correction algorithm for short reads.EC：一种用于短读段的高效纠错算法。

BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.

Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data.Karect：对下一代测序数据中的替换、插入和缺失错误进行精确校正。

Bioinformatics. 2015 Nov 1;31(21):3421-8. doi: 10.1093/bioinformatics/btv415. Epub 2015 Jul 14.

QuorUM: An Error Corrector for Illumina Reads.QuorUM：Illumina测序读数的纠错工具

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.去噪DNA深度测序数据——高通量测序错误及其校正

Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29.

ACE: accurate correction of errors using K-mer tries.ACE：使用 K-mer 尝试进行精确纠错。

Bioinformatics. 2015 Oct 1;31(19):3216-8. doi: 10.1093/bioinformatics/btv332. Epub 2015 May 28.

BFC: correcting Illumina sequencing errors.BFC：校正Illumina测序错误。

Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.

Pollux: platform independent error correction of single and mixed genomes.Pollux：单基因组和混合基因组的平台无关错误校正

BMC Bioinformatics. 2015 Jan 16;16(1):10. doi: 10.1186/s12859-014-0435-6.

Lighter: fast and memory-efficient sequencing error correction without counting.Lighter：无需计数即可实现快速且内存高效的测序错误校正。

Genome Biol. 2014;15(11):509. doi: 10.1186/s13059-014-0509-9.

Comprehensive variation discovery in single human genomes.单个人类基因组中的全面变异发现。

Nat Genet. 2014 Dec;46(12):1350-5. doi: 10.1038/ng.3121. Epub 2014 Oct 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估Illumina纠错工具对从头基因组组装的影响。

Evaluation of the impact of Illumina error correction tools on de novo genome assembly.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献