Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
Comput Intell Neurosci. 2022 Jul 14;2022:8077664. doi: 10.1155/2022/8077664. eCollection 2022.
In the mid-1970s, the first-generation sequencing technique (Sanger) was created. It used Advanced BioSystems sequencing devices and Beckman's GeXP genetic testing technology. The second-generation sequencing (2GS) technique arrived just several years after the first human genome was published in 2003. 2GS devices are very quicker than Sanger sequencing equipment, with considerably cheaper manufacturing costs and far higher throughput in the form of short reads. The third-generation sequencing (3GS) method, initially introduced in 2005, offers further reduced manufacturing costs and higher throughput. Even though sequencing technique has result generations, it is error-prone due to a large number of reads. The study of this massive amount of data will aid in the decoding of life secrets, the detection of infections, the development of improved crops, and the improvement of life quality, among other things. This is a challenging task, which is complicated not just by a large number of reads and by the occurrence of sequencing mistakes. As a result, error correction is a crucial duty in data processing; it entails identifying and correcting read errors. Various k-spectrum-based error correction algorithms' performance can be influenced by a variety of characteristics like coverage depth, read length, and genome size, as demonstrated in this work. As a result, time and effort must be put into selecting acceptable approaches for error correction of certain NGS data.
在 20 世纪 70 年代中期,第一代测序技术(Sanger)问世。它使用 Advanced BioSystems 测序设备和 Beckman 的 GeXP 基因检测技术。第二代测序(2GS)技术在 2003 年人类基因组首次公布后的几年内问世。2GS 设备比 Sanger 测序设备快得多,制造成本大大降低,短读长的通量也高得多。第三代测序(3GS)方法最初于 2005 年推出,进一步降低了制造成本,提高了通量。尽管测序技术已经产生了几代,但由于读取次数多,容易出错。对这些大量数据的研究将有助于解码生命秘密、检测感染、开发改良作物和提高生活质量等。这是一项具有挑战性的任务,不仅受到大量读取和测序错误的影响,而且还受到其他因素的影响。因此,纠错是数据处理中的一项关键任务,它包括识别和纠正读取错误。本工作表明,各种基于 k-spectrum 的纠错算法的性能可能受到覆盖深度、读长和基因组大小等多种特征的影响。因此,必须投入时间和精力来选择适合特定 NGS 数据纠错的方法。