基于 K- -mer 频谱的下一代测序数据纠错算法。

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data.

机构信息

Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.

出版信息

Comput Intell Neurosci. 2022 Jul 14;2022:8077664. doi: 10.1155/2022/8077664. eCollection 2022.

DOI:10.1155/2022/8077664

PMID:35875730

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9303089/

Abstract

In the mid-1970s, the first-generation sequencing technique (Sanger) was created. It used Advanced BioSystems sequencing devices and Beckman's GeXP genetic testing technology. The second-generation sequencing (2GS) technique arrived just several years after the first human genome was published in 2003. 2GS devices are very quicker than Sanger sequencing equipment, with considerably cheaper manufacturing costs and far higher throughput in the form of short reads. The third-generation sequencing (3GS) method, initially introduced in 2005, offers further reduced manufacturing costs and higher throughput. Even though sequencing technique has result generations, it is error-prone due to a large number of reads. The study of this massive amount of data will aid in the decoding of life secrets, the detection of infections, the development of improved crops, and the improvement of life quality, among other things. This is a challenging task, which is complicated not just by a large number of reads and by the occurrence of sequencing mistakes. As a result, error correction is a crucial duty in data processing; it entails identifying and correcting read errors. Various k-spectrum-based error correction algorithms' performance can be influenced by a variety of characteristics like coverage depth, read length, and genome size, as demonstrated in this work. As a result, time and effort must be put into selecting acceptable approaches for error correction of certain NGS data.

摘要

在 20 世纪 70 年代中期，第一代测序技术（Sanger）问世。它使用 Advanced BioSystems 测序设备和 Beckman 的 GeXP 基因检测技术。第二代测序（2GS）技术在 2003 年人类基因组首次公布后的几年内问世。2GS 设备比 Sanger 测序设备快得多，制造成本大大降低，短读长的通量也高得多。第三代测序（3GS）方法最初于 2005 年推出，进一步降低了制造成本，提高了通量。尽管测序技术已经产生了几代，但由于读取次数多，容易出错。对这些大量数据的研究将有助于解码生命秘密、检测感染、开发改良作物和提高生活质量等。这是一项具有挑战性的任务，不仅受到大量读取和测序错误的影响，而且还受到其他因素的影响。因此，纠错是数据处理中的一项关键任务，它包括识别和纠正读取错误。本工作表明，各种基于 k-spectrum 的纠错算法的性能可能受到覆盖深度、读长和基因组大小等多种特征的影响。因此，必须投入时间和精力来选择适合特定 NGS 数据纠错的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c40/9303089/64ef28512a69/CIN2022-8077664.001.jpg

相似文献

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data.基于 K- -mer 频谱的下一代测序数据纠错算法。

Comput Intell Neurosci. 2022 Jul 14;2022:8077664. doi: 10.1155/2022/8077664. eCollection 2022.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.基于k谱的下一代测序数据分析纠错方法的比较研究。

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology.伪桑格测序：使用下一代测序（NGS）技术大规模并行产生长且近乎无错误的 reads。

BMC Genomics. 2013 Oct 17;14(1):711. doi: 10.1186/1471-2164-14-711.

Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly.长测序读段的迭代纠错可最大化准确性并改善重叠群组装。

Brief Bioinform. 2017 Jan;18(1):1-8. doi: 10.1093/bib/bbw003. Epub 2016 Feb 10.

Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna：用于配置短读和长读基因组测序错误纠正工具的变压器架构。

BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

HISEA: HIerarchical SEed Aligner for PacBio data.HISEA：用于PacBio数据的分层种子比对器。

BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model.一种综合概率模型考虑杂合变异的混合校正方法。

BMC Genomics. 2020 Nov 18;21(Suppl 10):753. doi: 10.1186/s12864-020-07008-9.

In search of perfect reads.寻找完美的读数。

BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S7. doi: 10.1186/1471-2105-16-S17-S7. Epub 2015 Dec 7.

EC: an efficient error correction algorithm for short reads.EC：一种用于短读段的高效纠错算法。

BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.

引用本文的文献

Prokrustean Graph: A substring index for rapid k-mer size analysis.普罗克汝斯忒斯图：一种用于快速k-mer大小分析的子串索引。

bioRxiv. 2024 Dec 20:2023.11.21.568151. doi: 10.1101/2023.11.21.568151.

本文引用的文献

Composite learning sliding mode synchronization of chaotic fractional-order neural networks.混沌分数阶神经网络的复合学习滑模同步

J Adv Res. 2020 Apr 26;25:87-96. doi: 10.1016/j.jare.2020.04.006. eCollection 2020 Sep.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.基于k谱的下一代测序数据分析纠错方法的比较研究。

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data.Karect：对下一代测序数据中的替换、插入和缺失错误进行精确校正。

Bioinformatics. 2015 Nov 1;31(21):3421-8. doi: 10.1093/bioinformatics/btv415. Epub 2015 Jul 14.

Lighter: fast and memory-efficient sequencing error correction without counting.Lighter：无需计数即可实现快速且内存高效的测序错误校正。

Genome Biol. 2014;15(11):509. doi: 10.1186/s13059-014-0509-9.

Effects of error-correction of heterozygous next-generation sequencing data.杂合子下一代测序数据纠错的影响。

BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S3. doi: 10.1186/1471-2105-15-S7-S3. Epub 2014 May 28.

Trowel: a fast and accurate error correction module for Illumina sequencing reads.Trowel：一种用于 Illumina 测序reads 的快速准确的错误校正模块。

Bioinformatics. 2014 Nov 15;30(22):3264-5. doi: 10.1093/bioinformatics/btu513. Epub 2014 Jul 29.

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.BLESS：基于布隆过滤器的高通量测序读错误纠正解决方案。

Bioinformatics. 2014 May 15;30(10):1354-62. doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.

Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data.Musket：一种基于多阶段 k-mer 频谱的 Illumina 序列数据错误校正工具。

Bioinformatics. 2013 Feb 1;29(3):308-15. doi: 10.1093/bioinformatics/bts690. Epub 2012 Nov 29.

Efficient error correction for next-generation sequencing of viral amplicons.高效的病毒扩增子下一代测序错误校正。

BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-13-S10-S6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 K- -mer 频谱的下一代测序数据纠错算法。

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献