• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ECHO:一种无参考的短读错误纠正算法。

ECHO: a reference-free short-read error correction algorithm.

机构信息

Computer Science Division, University of California-Berkeley, CA 94721, USA.

出版信息

Genome Res. 2011 Jul;21(7):1181-92. doi: 10.1101/gr.111351.110. Epub 2011 Apr 11.

DOI:10.1101/gr.111351.110
PMID:21482625
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3129260/
Abstract

Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth.

摘要

开发准确、可扩展的算法以提高数据质量是与高通量测序技术的最新进展相关的一个重要计算挑战。在本研究中,引入了一种称为 ECHO 的新型纠错算法,用于纠正短读序列中的碱基调用错误,而无需参考基因组。与大多数先前的方法不同,ECHO 不需要用户指定其最优值通常是未知的参数。ECHO 自动设置假设模型中的参数,并估计每个测序运行特有的误差特征,同时保持在实际使用范围内的运行时间。ECHO 基于概率模型,能够为每个校正后的碱基分配一个质量分数。此外,它明确地对二倍体基因组中的杂合性进行建模,并提供了一种无参考的方法来检测源自杂合位点的碱基。在真实和模拟数据上,ECHO 能够将以前的纠错方法的准确性提高几个数量级,具体取决于序列覆盖深度和读取位置。在读取的末尾,改进最为明显,此时以前的方法明显效果不佳。使用全基因组酵母数据集,本文证明 ECHO 能够应对非均匀覆盖。还表明,使用 ECHO 作为预处理步骤进行错误校正可以极大地促进从头组装,尤其是在低到中等序列覆盖深度的情况下。

相似文献

1
ECHO: a reference-free short-read error correction algorithm.ECHO:一种无参考的短读错误纠正算法。
Genome Res. 2011 Jul;21(7):1181-92. doi: 10.1101/gr.111351.110. Epub 2011 Apr 11.
2
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
3
A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.一种用于在支持CUDA的图形硬件上对高通量短读数据进行纠错的并行算法。
J Comput Biol. 2010 Apr;17(4):603-15. doi: 10.1089/cmb.2009.0062.
4
Blue: correcting sequencing errors using consensus and context.蓝色:使用一致性和上下文来纠正测序错误。
Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.
5
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
6
Repeat-aware modeling and correction of short read errors.重复感知建模和短读错误纠正。
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S52. doi: 10.1186/1471-2105-12-S1-S52.
7
Simultaneous compression of multiple error-corrected short-read sets for faster data transmission and better de novo assemblies.同时压缩多个纠错后的短读段,以实现更快的数据传输和更好的从头组装。
Brief Funct Genomics. 2022 Sep 16;21(5):387-398. doi: 10.1093/bfgp/elac016.
8
A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model.一种综合概率模型考虑杂合变异的混合校正方法。
BMC Genomics. 2020 Nov 18;21(Suppl 10):753. doi: 10.1186/s12864-020-07008-9.
9
SHREC: a short-read error correction method.SHREC:一种短读长错误校正方法。
Bioinformatics. 2009 Sep 1;25(17):2157-63. doi: 10.1093/bioinformatics/btp379. Epub 2009 Jun 19.
10
Effects of error-correction of heterozygous next-generation sequencing data.杂合子下一代测序数据纠错的影响。
BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S3. doi: 10.1186/1471-2105-15-S7-S3. Epub 2014 May 28.

引用本文的文献

1
MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads.MAC-ErrorReads:一种基于机器学习的分类器,用于过滤错误的 NGS 读取。
BMC Bioinformatics. 2024 Feb 7;25(1):61. doi: 10.1186/s12859-024-05681-1.
2
Methods to improve the accuracy of next-generation sequencing.提高下一代测序准确性的方法。
Front Bioeng Biotechnol. 2023 Jan 20;11:982111. doi: 10.3389/fbioe.2023.982111. eCollection 2023.
3
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
4
Lightweight Pattern Matching Method for DNA Sequencing in Internet of Medical Things.物联网中 DNA 测序的轻量级模式匹配方法。
Comput Intell Neurosci. 2022 Sep 8;2022:6980335. doi: 10.1155/2022/6980335. eCollection 2022.
5
CARE 2.0: reducing false-positive sequencing error corrections using machine learning.CARE 2.0:利用机器学习减少假阳性测序错误纠正。
BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.
6
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna:用于配置短读和长读基因组测序错误纠正工具的变压器架构。
BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.
7
Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.单碱基分辨率下 miRNA reads 的畸变校正超微分析:一种 k-mer 格点方法。
Nucleic Acids Res. 2021 Oct 11;49(18):e106. doi: 10.1093/nar/gkab610.
8
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
9
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.雅典娜:使用语言模型自动调整基于 k-mer 的基因组纠错算法。
Sci Rep. 2019 Nov 6;9(1):16157. doi: 10.1038/s41598-019-52196-4.
10
GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data.GASAL2:一个用于高通量 NGS 数据的 GPU 加速序列比对库。
BMC Bioinformatics. 2019 Oct 25;20(1):520. doi: 10.1186/s12859-019-3086-9.

本文引用的文献

1
naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.朴素贝叶斯碱基识别:一种用于高通量测序的基于模型的高效碱基识别算法。
J Comput Biol. 2011 Mar;18(3):365-77. doi: 10.1089/cmb.2010.0247.
2
Reptile: representative tiling for short read error correction.爬行动物:简称短读错误纠正的代表性平铺。
Bioinformatics. 2010 Oct 15;26(20):2526-33. doi: 10.1093/bioinformatics/btq468. Epub 2010 Aug 16.
3
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
4
Recount: expectation maximization based error correction tool for next generation sequencing data.叙述:基于期望最大化的新一代测序数据纠错工具。
Genome Inform. 2009 Oct;23(1):189-201.
5
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
6
Sequencing technologies - the next generation.测序技术——下一代。
Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.
7
Improved base calling for the Illumina Genome Analyzer using machine learning strategies.使用机器学习策略改进Illumina基因组分析仪的碱基识别
Genome Biol. 2009;10(8):R83. doi: 10.1186/gb-2009-10-8-r83. Epub 2009 Aug 14.
8
BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing.贝叶斯碱基识别:一种用于高通量短读长测序的基于模型的碱基识别算法。
Genome Res. 2009 Oct;19(10):1884-95. doi: 10.1101/gr.095299.109. Epub 2009 Aug 6.
9
Swift: primary data analysis for the Illumina Solexa sequencing platform.Swift:Illumina Solexa测序平台的主要数据分析
Bioinformatics. 2009 Sep 1;25(17):2194-9. doi: 10.1093/bioinformatics/btp383. Epub 2009 Jun 23.
10
SHREC: a short-read error correction method.SHREC:一种短读长错误校正方法。
Bioinformatics. 2009 Sep 1;25(17):2157-63. doi: 10.1093/bioinformatics/btp379. Epub 2009 Jun 19.