Suppr超能文献

使用德布鲁因图对长读段中的错误进行准确的自我校正。

Accurate self-correction of errors in long reads using de Bruijn graphs.

作者信息

Salmela Leena, Walve Riku, Rivals Eric, Ukkonen Esko

机构信息

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier, France.

出版信息

Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.

Abstract

MOTIVATION

New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.

RESULTS

We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.

AVAILABILITY AND IMPLEMENTATION

LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ .

CONTACT

leena.salmela@cs.helsinki.fi.

摘要

动机

新的长读长测序技术,如PacBio SMRT和牛津纳米孔技术,能够产生长达50000bp的测序读段,但错误率至少为15%。降低错误率对于后续将这些读段用于例如从头基因组组装等应用是必要的。错误校正问题要么通过将长读段相互比对来解决,要么通过一种混合方法来解决,该方法利用第二代测序技术产生的更准确的短读段来校正长读段。

结果

我们提出了一种仅使用长读段的错误校正方法。该方法包括两个阶段:首先,我们使用一种基于德布鲁因图的迭代无比对校正方法,其中k-mer的长度不断增加;其次,使用通过多重比对找到的长距离依赖性对校正后的读段进行进一步优化。根据我们的实验,对于高覆盖度的读段集,所提出的方法是仅依赖长读段的最准确方法。此外,当读段集的覆盖度至少为75×时,新方法的通量至少高20%。

可用性和实现

LoRMA可在http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/免费获取。

联系方式

leena.salmela@cs.helsinki.fi

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验