• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HECIL:一种具有迭代学习的长读长混合纠错算法。

HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.

机构信息

Postdoctoral Researcher, IBM Research, Cambridge, MA, 02142, USA.

Visiting Research Scientist, Mitsubishi Electric Research Laboratories, Cambridge, MA, 02139, USA.

出版信息

Sci Rep. 2018 Jul 2;8(1):9936. doi: 10.1038/s41598-018-28364-3.

DOI:10.1038/s41598-018-28364-3
PMID:29967328
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6028576/
Abstract

Second-generation DNA sequencing techniques generate short reads that can result in fragmented genome assemblies. Third-generation sequencing platforms mitigate this limitation by producing longer reads that span across complex and repetitive regions. However, the usefulness of such long reads is limited because of high sequencing error rates. To exploit the full potential of these longer reads, it is imperative to correct the underlying errors. We propose HECIL-Hybrid Error Correction with Iterative Learning-a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments. We demonstrate that HECIL outperforms state-of-the-art error correction algorithms for an overwhelming majority of evaluation metrics on diverse, real-world data sets including E. coli, S. cerevisiae, and the malaria vector mosquito A. funestus. Additionally, we provide an optional avenue of improving the performance of HECIL's core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.

摘要

第二代 DNA 测序技术产生的短读长会导致基因组组装片段化。第三代测序平台通过生成跨越复杂和重复区域的长读长来缓解这一限制。然而,由于测序错误率高,这种长读长的用处有限。为了充分利用这些更长的读长,必须纠正潜在的错误。我们提出了 HECIL-Hybrid Error Correction with Iterative Learning,这是一种混合错误校正框架,它基于从短读对齐中获得的最优决策权重组合,为错误的长读确定校正策略。我们证明,在包括大肠杆菌、酿酒酵母和疟疾病媒按蚊 A. funestus 在内的各种真实世界数据集上,HECIL 在绝大多数评估指标上都优于最先进的错误校正算法。此外,我们通过引入迭代学习范例提供了一种改进 HECIL 核心算法性能的可选途径,该范例通过使用数据驱动的置信度指标将从前几次迭代中收集到的知识分配给先前的校正,从而在每次迭代中增强校正策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/a79ecbc9c235/41598_2018_28364_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/f9f77d502f7e/41598_2018_28364_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/10906f35f6c5/41598_2018_28364_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/a79ecbc9c235/41598_2018_28364_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/f9f77d502f7e/41598_2018_28364_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/10906f35f6c5/41598_2018_28364_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a03/6028576/a79ecbc9c235/41598_2018_28364_Fig3_HTML.jpg

相似文献

1
HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.HECIL:一种具有迭代学习的长读长混合纠错算法。
Sci Rep. 2018 Jul 2;8(1):9936. doi: 10.1038/s41598-018-28364-3.
2
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
3
Accurate self-correction of errors in long reads using de Bruijn graphs.使用德布鲁因图对长读段中的错误进行准确的自我校正。
Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.
4
NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning.NmTHC:一种基于具有迁移学习的生成式神经机器翻译模型的混合错误纠正方法。
BMC Genomics. 2024 Jun 7;25(1):573. doi: 10.1186/s12864-024-10446-4.
5
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。
BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.
6
HALC: High throughput algorithm for long read error correction.HALC:用于长读长纠错的高通量算法。
BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.
7
CARE 2.0: reducing false-positive sequencing error corrections using machine learning.CARE 2.0:利用机器学习减少假阳性测序错误纠正。
BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.
8
In search of perfect reads.寻找完美的读数。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S7. doi: 10.1186/1471-2105-16-S17-S7. Epub 2015 Dec 7.
9
Hercules: a profile HMM-based hybrid error correction algorithm for long reads.赫拉克勒斯:一种基于轮廓隐马尔可夫模型的长读混合纠错算法。
Nucleic Acids Res. 2018 Nov 30;46(21):e125. doi: 10.1093/nar/gky724.
10
Integration of hybrid and self-correction method improves the quality of long-read sequencing data.混合和自校正方法的整合提高了长读测序数据的质量。
Brief Funct Genomics. 2024 May 15;23(3):249-255. doi: 10.1093/bfgp/elad026.

引用本文的文献

1
NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning.NmTHC:一种基于具有迁移学习的生成式神经机器翻译模型的混合错误纠正方法。
BMC Genomics. 2024 Jun 7;25(1):573. doi: 10.1186/s12864-024-10446-4.
2
The Application of Long-Read Sequencing to Cancer.长读长测序在癌症中的应用
Cancers (Basel). 2024 Mar 25;16(7):1275. doi: 10.3390/cancers16071275.
3
ARAMIS: From systematic errors of NGS long reads to accurate assemblies.ARAMIS:从 NGS 长读的系统误差到精确组装。

本文引用的文献

1
Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation.Canu:通过自适应k-mer加权和重复序列分离实现可扩展且准确的长读长序列拼接
Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15.
2
Improved assembly of noisy long reads by k-mer validation.通过k-mer验证改进嘈杂长读段的组装。
Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.
3
CoLoRMap: Correcting Long Reads by Mapping short reads.CoLoRMap:通过映射短读段来校正长读段
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.
4
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
Bioinformatics. 2016 Sep 1;32(17):i545-i551. doi: 10.1093/bioinformatics/btw463.
4
Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly.使用蚊虫样本评估DISCOVAR de novo进行经济高效的短读长基因组组装。
BMC Genomics. 2016 Mar 5;17:187. doi: 10.1186/s12864-016-2531-7.
5
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.牛津纳米孔测序、混合纠错及真核生物基因组的从头组装
Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.
6
Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.通过挖掘功能关联、序列以及蛋白质-蛋白质和基因-基因相互作用网络进行综合蛋白质功能预测。
Methods. 2016 Jan 15;93:84-91. doi: 10.1016/j.ymeth.2015.09.011. Epub 2015 Sep 11.
7
LoRDEC: accurate and efficient long read error correction.LoRDEC:准确高效的长读错误纠正。
Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.
8
proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.Proovread:通过迭代短读共识实现大规模高精度 PacBio 校正。
Bioinformatics. 2014 Nov 1;30(21):3004-11. doi: 10.1093/bioinformatics/btu392. Epub 2014 Jul 10.
9
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.非杂交、基于长读长 SMRT 测序数据的完成微生物基因组组装。
Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.
10
QUAST: quality assessment tool for genome assemblies.QUAST:基因组组装质量评估工具。
Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.