• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VeChat:使用变异图谱纠正长读中的错误。

VeChat: correcting errors in long reads using variation graphs.

机构信息

Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.

Life Science & Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands.

出版信息

Nat Commun. 2022 Nov 4;13(1):6657. doi: 10.1038/s41467-022-34381-8.

DOI:10.1038/s41467-022-34381-8
PMID:36333324
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9636371/
Abstract

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .

摘要

纠错是长读测序数据分析的标准第一步。然而,当前的自纠错方法受到共识序列诱导偏差的影响,这些偏差掩盖了混合样本中较低频率单倍型的真实变体。与共识序列模板不同,基于图的参考系统不受此类偏差的影响,因此不会错误地将真实变体标记为错误。我们提出了 VeChat,作为实现这一想法的一种方法:VeChat 基于变异图,作为泛基因组参考系统的一种流行数据结构类型。广泛的基准测试实验表明,通过 VeChat 纠错的长读序列比通过最先进方法纠错的长读序列错误少 4 到 15 倍(Pacific Biosciences)和 1 到 10 倍(Oxford Nanopore Technologies)。此外,在长读组装之前使用 VeChat 可以显著提高组装的单倍型意识。VeChat 是一个易于使用的开源工具,并可在 https://github.com/HaploKit/vechat 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/3d60a15f5dba/41467_2022_34381_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/31aea7ef8b2a/41467_2022_34381_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/38eefe59b240/41467_2022_34381_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/ddae1558ea28/41467_2022_34381_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/3d60a15f5dba/41467_2022_34381_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/31aea7ef8b2a/41467_2022_34381_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/38eefe59b240/41467_2022_34381_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/ddae1558ea28/41467_2022_34381_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/3d60a15f5dba/41467_2022_34381_Fig4_HTML.jpg

相似文献

1
VeChat: correcting errors in long reads using variation graphs.VeChat:使用变异图谱纠正长读中的错误。
Nat Commun. 2022 Nov 4;13(1):6657. doi: 10.1038/s41467-022-34381-8.
2
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
3
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
4
Hybrid-hybrid correction of errors in long reads with HERO.使用 HERO 对长读进行混合-混合纠错。
Genome Biol. 2023 Dec 1;24(1):275. doi: 10.1186/s13059-023-03112-7.
5
De novo diploid genome assembly using long noisy reads.从头组装具有长噪声读长的二倍体基因组。
Nat Commun. 2024 Apr 5;15(1):2964. doi: 10.1038/s41467-024-47349-7.
6
ARAMIS: From systematic errors of NGS long reads to accurate assemblies.ARAMIS:从 NGS 长读的系统误差到精确组装。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.
7
A spectral algorithm for fast de novo layout of uncorrected long nanopore reads.一种用于快速从头设计未经校正的长纳米孔读段的谱算法。
Bioinformatics. 2017 Oct 15;33(20):3188-3194. doi: 10.1093/bioinformatics/btx370.
8
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
9
Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data.双核体中单倍型的物理分离允许使用 Hi-C 数据对 Nanopore 和 HiFi 组装的相位准确性进行基准测试。
Genome Biol. 2022 Mar 25;23(1):84. doi: 10.1186/s13059-022-02658-2.
10
Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.基于无环变异图的错误长读精确比对链。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad460.

引用本文的文献

1
BonoboFlow: viral genome assembly and haplotype reconstruction from nanopore reads.倭黑猩猩流程:基于纳米孔测序 reads 的病毒基因组组装与单倍型重建
Bioinform Adv. 2025 May 13;5(1):vbaf115. doi: 10.1093/bioadv/vbaf115. eCollection 2025.
2
Three reference genomes for freshwater diatom ecology and evolution.用于淡水硅藻生态学与进化研究的三个参考基因组。
J Phycol. 2025 Apr;61(2):267-274. doi: 10.1111/jpy.13545. Epub 2025 Feb 10.
3
Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.使用DeChat对纳米孔测序读数进行重复和单倍型感知错误校正。

本文引用的文献

1
Enhancing Long-Read-Based Strain-Aware Metagenome Assembly.增强基于长读长的菌株感知宏基因组组装
Front Genet. 2022 May 13;13:868280. doi: 10.3389/fgene.2022.868280. eCollection 2022.
2
Strainline: full-length de novo viral haplotype reconstruction from noisy long reads.Strainline:从嘈杂的长读段中全长从头重建病毒单倍型。
Genome Biol. 2022 Jan 20;23(1):29. doi: 10.1186/s13059-021-02587-6.
3
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.泛基因组学能够对 5202 个不同基因组中的已知结构变异进行基因分型。
Commun Biol. 2024 Dec 19;7(1):1678. doi: 10.1038/s42003-024-07376-y.
4
DeepCorr: a novel error correction method for 3GS long reads based on deep learning.DeepCorr:一种基于深度学习的针对3GS长读段的新型错误校正方法。
PeerJ Comput Sci. 2024 Jul 26;10:e2160. doi: 10.7717/peerj-cs.2160. eCollection 2024.
5
When less is more: sketching with minimizers in genomics.少即是多:基因组学中的最小化器草图。
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
6
Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing.Strainy:从长读宏基因组测序中对菌株单倍型进行相位和组装。
Nat Methods. 2024 Nov;21(11):2034-2043. doi: 10.1038/s41592-024-02424-1. Epub 2024 Sep 26.
7
Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.简化基因组监测:针对 HIV-1 和其他病原性病毒的多菌株混合数据,对长读长组装器进行全面性能评估,以构建用户友好的生物信息学管道。
F1000Res. 2024 May 31;13:556. doi: 10.12688/f1000research.149577.1. eCollection 2024.
8
Unveiling microbial diversity: harnessing long-read sequencing technology.揭示微生物多样性:利用长读长测序技术
Nat Methods. 2024 Jun;21(6):954-966. doi: 10.1038/s41592-024-02262-1. Epub 2024 Apr 30.
9
Co-linear chaining on pangenome graphs.泛基因组图谱上的共线性连锁
Algorithms Mol Biol. 2024 Jan 27;19(1):4. doi: 10.1186/s13015-024-00250-w.
10
High-quality metagenome assembly from long accurate reads with metaMDBG.使用 metaMDBG 从长而准确的读取中进行高质量的宏基因组组装。
Nat Biotechnol. 2024 Sep;42(9):1378-1383. doi: 10.1038/s41587-023-01983-6. Epub 2024 Jan 2.
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.
4
phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.相位图:基于长读长的二倍体基因组单体型感知从头组装
Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x.
5
Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。
Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.
6
Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer.全基因组测序与长读长揭示了人类遗传变异和癌症体细胞突变中结构变异的复杂结构和起源。
Genome Med. 2021 Apr 29;13(1):65. doi: 10.1186/s13073-021-00883-1.
7
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
8
metaFlye: scalable long-read metagenome assembly using repeat graphs.metaFlye:使用重复图进行可扩展的长读长宏基因组组装。
Nat Methods. 2020 Nov;17(11):1103-1110. doi: 10.1038/s41592-020-00971-x. Epub 2020 Oct 5.
9
PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2:一种带有新型质量评分生成模型的长读测序模拟软件。
Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.
10
Haplotype threading: accurate polyploid phasing from long reads.单体型连接:长读长准确进行多倍体相位分析。
Genome Biol. 2020 Sep 21;21(1):252. doi: 10.1186/s13059-020-02158-1.