Suppr超能文献

FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。

FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.

机构信息

Software Engineering Research Center, School of Software Engineering, Beijing Jiaotong University, Beijing, China.

Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.

出版信息

Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.

Abstract

MOTIVATION

The third generation PacBio long reads have greatly facilitated sequencing projects with very large read lengths, but they contain about 15% sequencing errors and need error correction. For the projects with long reads only, it is challenging to make correction with fast speed, and also challenging to correct a sufficient amount of read bases, i.e. to achieve high-throughput self-correction. MECAT is currently among the fastest self-correction algorithms, but its throughput is relatively small (Xiao et al., 2017).

RESULTS

Here, we introduce FLAS, a wrapper algorithm of MECAT, to achieve high-throughput long-read self-correction while keeping MECAT's fast speed. FLAS finds additional alignments from MECAT prealigned long reads to improve the correction throughput, and removes misalignments for accuracy. In addition, FLAS also uses the corrected long-read regions to correct the uncorrected ones to further improve the throughput. In our performance tests on Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana and human long reads, FLAS can achieve 22.0-50.6% larger throughput than MECAT. FLAS is 2-13× faster compared to the self-correction algorithms other than MECAT, and its throughput is also 9.8-281.8% larger. The FLAS corrected long reads can be assembled into contigs of 13.1-29.8% larger N50 sizes than MECAT.

AVAILABILITY AND IMPLEMENTATION

The FLAS software can be downloaded for free from this site: https://github.com/baoe/flas.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

第三代 PacBio 长读长极大地促进了读长长度非常大的测序项目,但它们包含约 15%的测序错误,需要纠错。对于仅长读长的项目,快速进行纠错具有挑战性,并且也难以纠正足够数量的读碱基,即实现高通量自我纠错。MECAT 是目前最快的自我纠错算法之一,但它的通量相对较小(Xiao 等人,2017)。

结果

在这里,我们引入了 FLAS,这是 MECAT 的包装算法,可在保持 MECAT 快速速度的同时实现高通量长读长自我纠错。FLAS 从 MECAT 预对齐的长读长中找到额外的比对,以提高纠错通量,并去除错误比对以提高准确性。此外,FLAS 还使用已校正的长读长区域来校正未校正的区域,以进一步提高吞吐量。在我们对大肠杆菌、酿酒酵母、拟南芥和人类长读长的性能测试中,FLAS 可以实现比 MECAT 大 22.0-50.6%的吞吐量。与除 MECAT 之外的其他自我纠错算法相比,FLAS 的速度快 2-13 倍,其吞吐量也大 9.8-281.8%。FLAS 校正的长读长可以组装成比 MECAT 大 13.1-29.8%的 N50 大小的 contigs。

可用性和实现

FLAS 软件可从以下网址免费下载:https://github.com/baoe/flas。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验