• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。

FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.

机构信息

Software Engineering Research Center, School of Software Engineering, Beijing Jiaotong University, Beijing, China.

Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.

出版信息

Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.

DOI:10.1093/bioinformatics/btz206
PMID:30895306
Abstract

MOTIVATION

The third generation PacBio long reads have greatly facilitated sequencing projects with very large read lengths, but they contain about 15% sequencing errors and need error correction. For the projects with long reads only, it is challenging to make correction with fast speed, and also challenging to correct a sufficient amount of read bases, i.e. to achieve high-throughput self-correction. MECAT is currently among the fastest self-correction algorithms, but its throughput is relatively small (Xiao et al., 2017).

RESULTS

Here, we introduce FLAS, a wrapper algorithm of MECAT, to achieve high-throughput long-read self-correction while keeping MECAT's fast speed. FLAS finds additional alignments from MECAT prealigned long reads to improve the correction throughput, and removes misalignments for accuracy. In addition, FLAS also uses the corrected long-read regions to correct the uncorrected ones to further improve the throughput. In our performance tests on Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana and human long reads, FLAS can achieve 22.0-50.6% larger throughput than MECAT. FLAS is 2-13× faster compared to the self-correction algorithms other than MECAT, and its throughput is also 9.8-281.8% larger. The FLAS corrected long reads can be assembled into contigs of 13.1-29.8% larger N50 sizes than MECAT.

AVAILABILITY AND IMPLEMENTATION

The FLAS software can be downloaded for free from this site: https://github.com/baoe/flas.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

第三代 PacBio 长读长极大地促进了读长长度非常大的测序项目,但它们包含约 15%的测序错误,需要纠错。对于仅长读长的项目,快速进行纠错具有挑战性,并且也难以纠正足够数量的读碱基,即实现高通量自我纠错。MECAT 是目前最快的自我纠错算法之一,但它的通量相对较小(Xiao 等人,2017)。

结果

在这里,我们引入了 FLAS,这是 MECAT 的包装算法,可在保持 MECAT 快速速度的同时实现高通量长读长自我纠错。FLAS 从 MECAT 预对齐的长读长中找到额外的比对,以提高纠错通量,并去除错误比对以提高准确性。此外,FLAS 还使用已校正的长读长区域来校正未校正的区域,以进一步提高吞吐量。在我们对大肠杆菌、酿酒酵母、拟南芥和人类长读长的性能测试中,FLAS 可以实现比 MECAT 大 22.0-50.6%的吞吐量。与除 MECAT 之外的其他自我纠错算法相比,FLAS 的速度快 2-13 倍,其吞吐量也大 9.8-281.8%。FLAS 校正的长读长可以组装成比 MECAT 大 13.1-29.8%的 N50 大小的 contigs。

可用性和实现

FLAS 软件可从以下网址免费下载:https://github.com/baoe/flas。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。
Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.
2
HALC: High throughput algorithm for long read error correction.HALC:用于长读长纠错的高通量算法。
BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.
3
MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads.MECAT:用于单分子测序读取的快速映射、错误纠正和从头组装。
Nat Methods. 2017 Nov;14(11):1072-1074. doi: 10.1038/nmeth.4432. Epub 2017 Sep 18.
4
AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.AlignGraph2:用于 PacBio 长读长的相似基因组辅助重组装流程。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab022.
5
ReMILO: reference assisted misassembly detection algorithm using short and long reads.ReMILO:使用短读长读的参考辅助错误组装检测算法。
Bioinformatics. 2018 Jan 1;34(1):24-32. doi: 10.1093/bioinformatics/btx524.
6
Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。
Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.
7
NextPolish: a fast and efficient genome polishing tool for long-read assembly.NextPolish:一种用于长读长组装的快速高效基因组精修工具。
Bioinformatics. 2020 Apr 1;36(7):2253-2255. doi: 10.1093/bioinformatics/btz891.
8
Fec: a fast error correction method based on two-rounds overlapping and caching.Fec:一种基于两轮重叠和缓存的快速纠错方法。
Bioinformatics. 2022 Sep 30;38(19):4629-4632. doi: 10.1093/bioinformatics/btac565.
9
Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm.阿波罗:一种与测序技术无关、可扩展且准确的组装后处理算法。
Bioinformatics. 2020 Jun 1;36(12):3669-3679. doi: 10.1093/bioinformatics/btaa179.
10
A spectral algorithm for fast de novo layout of uncorrected long nanopore reads.一种用于快速从头设计未经校正的长纳米孔读段的谱算法。
Bioinformatics. 2017 Oct 15;33(20):3188-3194. doi: 10.1093/bioinformatics/btx370.

引用本文的文献

1
Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。
Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.
2
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。
Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.
3
Applications of long-read sequencing to Mendelian genetics.长读测序在孟德尔遗传学中的应用。
Genome Med. 2023 Jun 14;15(1):42. doi: 10.1186/s13073-023-01194-3.
4
LCAT: an isoform-sensitive error correction for transcriptome sequencing long reads.LCAT:一种针对转录组测序长读段的异构体敏感错误校正方法
Front Genet. 2023 May 24;14:1166975. doi: 10.3389/fgene.2023.1166975. eCollection 2023.
5
VeChat: correcting errors in long reads using variation graphs.VeChat:使用变异图谱纠正长读中的错误。
Nat Commun. 2022 Nov 4;13(1):6657. doi: 10.1038/s41467-022-34381-8.
6
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
7
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.
8
ARAMIS: From systematic errors of NGS long reads to accurate assemblies.ARAMIS:从 NGS 长读的系统误差到精确组装。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.
9
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
10
A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。
BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.