Suppr超能文献

VeChat:使用变异图谱纠正长读中的错误。

VeChat: correcting errors in long reads using variation graphs.

机构信息

Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.

Life Science & Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands.

出版信息

Nat Commun. 2022 Nov 4;13(1):6657. doi: 10.1038/s41467-022-34381-8.

Abstract

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .

摘要

纠错是长读测序数据分析的标准第一步。然而,当前的自纠错方法受到共识序列诱导偏差的影响,这些偏差掩盖了混合样本中较低频率单倍型的真实变体。与共识序列模板不同,基于图的参考系统不受此类偏差的影响,因此不会错误地将真实变体标记为错误。我们提出了 VeChat,作为实现这一想法的一种方法:VeChat 基于变异图,作为泛基因组参考系统的一种流行数据结构类型。广泛的基准测试实验表明,通过 VeChat 纠错的长读序列比通过最先进方法纠错的长读序列错误少 4 到 15 倍(Pacific Biosciences)和 1 到 10 倍(Oxford Nanopore Technologies)。此外,在长读组装之前使用 VeChat 可以显著提高组装的单倍型意识。VeChat 是一个易于使用的开源工具,并可在 https://github.com/HaploKit/vechat 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/9636371/31aea7ef8b2a/41467_2022_34381_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验