Suppr超能文献

DeepCorr:一种基于深度学习的针对3GS长读段的新型错误校正方法。

DeepCorr: a novel error correction method for 3GS long reads based on deep learning.

作者信息

Wang Rongshu, Chen Jianhua

机构信息

Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China.

出版信息

PeerJ Comput Sci. 2024 Jul 26;10:e2160. doi: 10.7717/peerj-cs.2160. eCollection 2024.

Abstract

Long reads generated by third-generation sequencing (3GS) technologies are involved in many biological analyses and play a vital role due to their ultra-long read length. However, the high error rate affects the downstream process. DeepCorr, a novel error correction algorithm for data from both PacBio and ONT platforms based on deep learning is proposed. The core algorithm adopts a recurrent neural network to capture the long-term dependencies in the long reads to convert the problem of long-read error correction to a multi-classification task. It first aligns the high-precision short reads to long reads to generate the corresponding feature vectors and labels, then feeds these vectors to the neural network, and finally trains the model for prediction and error correction. DeepCorr produces untrimmed corrected long reads and improves the alignment identity while maintaining the length advantage. It can capture and make full use of the dependencies to polish those bases that are not aligned by any short read. DeepCorr achieves better performance than that of the state-of-the-art error correction methods on real-world PacBio and ONT benchmark data sets and consumes fewer computing resources. It is a comprehensive deep learning-based tool that enables one to correct long reads accurately.

摘要

由第三代测序(3GS)技术生成的长读段参与了许多生物学分析,并且由于其超长的读长而发挥着至关重要的作用。然而,高错误率会影响下游流程。本文提出了DeepCorr,一种基于深度学习的针对PacBio和ONT平台数据的新型错误校正算法。其核心算法采用循环神经网络来捕捉长读段中的长期依赖性,将长读段错误校正问题转化为多分类任务。它首先将高精度短读段与长读段进行比对以生成相应的特征向量和标签,然后将这些向量输入神经网络,最后训练模型进行预测和错误校正。DeepCorr生成未修剪的校正长读段,并在保持长度优势的同时提高比对一致性。它可以捕捉并充分利用依赖性来优化那些未被任何短读段比对上的碱基。在真实的PacBio和ONT基准数据集上,DeepCorr比最先进的错误校正方法具有更好的性能,并且消耗更少的计算资源。它是一个基于深度学习的综合性工具,能够使人们准确地校正长读段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a67d/11639150/1f8295229637/peerj-cs-10-2160-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验