Suppr超能文献

基于重叠图的二倍体和多倍体单倍型生成。

Overlap graph-based generation of haplotigs for diploids and polyploids.

机构信息

Centrum Wiskunde & Informatica, XG Amsterdam, The Netherlands.

Theoretical Biology and Bioinformatics, Utrecht University, CH Utrecht, The Netherlands.

出版信息

Bioinformatics. 2019 Nov 1;35(21):4281-4289. doi: 10.1093/bioinformatics/btz255.

Abstract

MOTIVATION

Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity.

RESULTS

We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings.

AVAILABILITY AND IMPLEMENTATION

POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单倍型感知基因组组装在遗传学、医学和其他各种学科中都起着重要作用,但生成单倍型解析的从头组装仍然是一个主要挑战。除了区分错误和真实的顺序变体外,还需要将真实的变体分配到不同的基因组副本。最近的工作指出,迄今为止,传统的 NGS 读取数据在单倍型计算方面还远远没有得到充分利用,这反映出参考独立单倍型计算的方法尚未成熟。

结果

我们提出了 POLYploid genome fitTEr(POLYTE),作为一种新的方法,用于生成已知ploidy 的二倍体和多倍体基因组的单倍型。我们的方法遵循一个迭代方案,在每个迭代中,根据潜在的单倍型感知重叠图,读取或 contigs 会根据它们的相互作用进行连接。在迭代过程中, contigs 在保持其单倍型身份的同时增长。在真实和模拟数据上的基准测试实验表明,POLYTE 在错误免费重建单倍型特异性序列方面建立了新的标准。因此,POLYTE 在各种相关方面都优于最先进的方法,在多倍体环境中优势尤为明显。

可用性和实现

POLYTE 作为 HaploConduct 包的一部分免费提供,可在 https://github.com/HaploConduct/HaploConduct 上获得,它是用 Python 和 C++实现的。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验