• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蓝色:使用一致性和上下文来纠正测序错误。

Blue: correcting sequencing errors using consensus and context.

作者信息

Greenfield Paul, Duesing Konsta, Papanicolaou Alexie, Bauer Denis C

机构信息

CSIRO Computational Informatics, School of IT, University of Sydney, CSIRO Animal, Food and Health Sciences, Sydney, NSW 2113, and CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia CSIRO Computational Informatics, School of IT, University of Sydney, CSIRO Animal, Food and Health Sciences, Sydney, NSW 2113, and CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia.

CSIRO Computational Informatics, School of IT, University of Sydney, CSIRO Animal, Food and Health Sciences, Sydney, NSW 2113, and CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia.

出版信息

Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.

DOI:10.1093/bioinformatics/btu368
PMID:24919879
Abstract

MOTIVATION

Bioinformatics tools, such as assemblers and aligners, are expected to produce more accurate results when given better quality sequence data as their starting point. This expectation has led to the development of stand-alone tools whose sole purpose is to detect and remove sequencing errors. A good error-correcting tool would be a transparent component in a bioinformatics pipeline, simply taking sequence data in any of the standard formats and producing a higher quality version of the same data containing far fewer errors. It should not only be able to correct all of the types of errors found in real sequence data (substitutions, insertions, deletions and uncalled bases), but it has to be both fast enough and scalable enough to be usable on the large datasets being produced by current sequencing technologies, and work on data derived from both haploid and diploid organisms.

RESULTS

This article presents Blue, an error-correction algorithm based on k-mer consensus and context. Blue can correct substitution, deletion and insertion errors, as well as uncalled bases. It accepts both FASTQ and FASTA formats, and corrects quality scores for corrected bases. Blue also maintains the pairing of reads, both within a file and between pairs of files, making it compatible with downstream tools that depend on read pairing. Blue is memory efficient, scalable and faster than other published tools, and usable on large sequencing datasets. On the tests undertaken, Blue also proved to be generally more accurate than other published algorithms, resulting in more accurately aligned reads and the assembly of longer contigs containing fewer errors. One significant feature of Blue is that its k-mer consensus table does not have to be derived from the set of reads being corrected. This decoupling makes it possible to correct one dataset, such as small set of 454 mate-pair reads, with the consensus derived from another dataset, such as Illumina reads derived from the same DNA sample. Such cross-correction can greatly improve the quality of small (and expensive) sets of long reads, leading to even better assemblies and higher quality finished genomes.

AVAILABILITY AND IMPLEMENTATION

The code for Blue and its related tools are available from http://www.bioinformatics.csiro.au/Blue. These programs are written in C# and run natively under Windows and under Mono on Linux.

摘要

动机

生物信息学工具,如序列组装器和比对器,若以质量更高的序列数据作为起点,有望产生更准确的结果。这种期望促使了一些独立工具的开发,其唯一目的是检测和去除测序错误。一个优秀的纠错工具应是生物信息学流程中的一个透明组件,只需接受任何标准格式的序列数据,并生成同一数据的高质量版本,其中错误要少得多。它不仅应能够纠正真实序列数据中发现的所有类型的错误(替换、插入、缺失和未调用碱基),还必须足够快且可扩展,以便能用于当前测序技术产生的大型数据集,并处理来自单倍体和二倍体生物的数据。

结果

本文介绍了Blue,一种基于k-mer一致性和上下文的纠错算法。Blue可以纠正替换、缺失和插入错误,以及未调用碱基。它接受FASTQ和FASTA两种格式,并为校正后的碱基校正质量分数。Blue还会保持文件内以及文件对之间的读段配对,使其与依赖读段配对的下游工具兼容。Blue内存高效、可扩展且比其他已发布的工具更快,可用于大型测序数据集。在所进行的测试中,Blue也被证明通常比其他已发布的算法更准确,从而使读段比对更准确,且能组装出包含更少错误的更长重叠群。Blue的一个显著特点是其k-mer一致性表不必从要校正的读段集中得出。这种解耦使得用来自另一个数据集(如来自同一DNA样本的Illumina读段)得出的一致性来校正一个数据集(如一小套454配对末端读段)成为可能。这种交叉校正可以极大地提高小(且昂贵)的长读段集的质量,从而带来更好的组装效果和更高质量的完整基因组。

可用性与实现

Blue及其相关工具的代码可从http://www.bioinformatics.csiro.au/Blue获取。这些程序用C#编写,可在Windows下原生运行,也可在Linux上通过Mono运行。

相似文献

1
Blue: correcting sequencing errors using consensus and context.蓝色:使用一致性和上下文来纠正测序错误。
Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.
2
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
3
Pollux: platform independent error correction of single and mixed genomes.Pollux:单基因组和混合基因组的平台无关错误校正
BMC Bioinformatics. 2015 Jan 16;16(1):10. doi: 10.1186/s12859-014-0435-6.
4
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
5
UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing.UNDR ROVER——一种用于靶向DNA测序的快速且准确的变异检测工具。
BMC Bioinformatics. 2016 Apr 16;17:165. doi: 10.1186/s12859-016-1014-9.
6
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
7
EC: an efficient error correction algorithm for short reads.EC:一种用于短读段的高效纠错算法。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.
8
BFC: correcting Illumina sequencing errors.BFC:校正Illumina测序错误。
Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.
9
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。
BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.
10
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.QC之后:对fastq数据进行自动过滤、修剪、错误去除和质量控制。
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):80. doi: 10.1186/s12859-017-1469-3.

引用本文的文献

1
Krumholzibacteriota and Deltaproteobacteria contain rare genetic potential to liberate carbon from monoaromatic compounds in subsurface coal seams.克鲁姆霍尔兹菌门和δ-变形菌门具有从地下煤层中的单环芳烃化合物中释放碳的罕见遗传潜力。
mBio. 2024 Mar 13;15(3):e0173523. doi: 10.1128/mbio.01735-23. Epub 2024 Feb 12.
2
MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads.MAC-ErrorReads:一种基于机器学习的分类器,用于过滤错误的 NGS 读取。
BMC Bioinformatics. 2024 Feb 7;25(1):61. doi: 10.1186/s12859-024-05681-1.
3
Illumina reads correction: evaluation and improvements.
Illumina测序读数校正:评估与改进
Sci Rep. 2024 Jan 26;14(1):2232. doi: 10.1038/s41598-024-52386-9.
4
Bioprospection of the bacterial β-myrcene-biotransforming trait in the rhizosphere.根际中细菌β-月桂烯生物转化特性的生物勘探。
Appl Microbiol Biotechnol. 2023 Aug;107(16):5209-5224. doi: 10.1007/s00253-023-12650-w. Epub 2023 Jul 5.
5
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
6
sp. 'CSMB_57', isolation and genomic insights from the most abundant bacterial taxon in eastern Australian coals.sp. 'CSMB_57',从澳大利亚东部煤田中丰度最高的细菌分类群中分离并获得基因组见解。
Microb Genom. 2022 Aug;8(8). doi: 10.1099/mgen.0.000857.
7
CARE 2.0: reducing false-positive sequencing error corrections using machine learning.CARE 2.0:利用机器学习减少假阳性测序错误纠正。
BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.
8
Draft Genome Sequence of sp. Strain CSMB_222, Isolated from Coal Seam Formation Water.从煤层形成水中分离的sp. 菌株CSMB_222的基因组序列草图
Microbiol Resour Announc. 2021 Dec 2;10(48):e0056421. doi: 10.1128/MRA.00564-21.
9
Draft Genome Sequence of sp. Strain SYD-A1, Isolated from Coal Seam Formation Water.从煤层形成水中分离的sp. 菌株SYD-A1的基因组序列草图
Microbiol Resour Announc. 2021 Mar 11;10(10):e01341-20. doi: 10.1128/MRA.01341-20.
10
Read trimming has minimal effect on bacterial SNP-calling accuracy.reads 修剪对细菌 SNP 调用准确性的影响最小。
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.