Suppr超能文献

通过整合杂合变异和Hi-C数据对纳米孔基因组进行定相组装。

Phasing nanopore genome assembly by integrating heterozygous variations and Hi-C data.

作者信息

Zhang Jun, Nie Fan, Luo Feng, Wang Jianxin

机构信息

School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.

Xiangjiang Laboratory, Changsha, Hunan 410205, China.

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae712.

Abstract

MOTIVATION

Haplotype-resolved genome assemblies serve as vital resources in various research domains, including genomics, medicine, and pangenomics. Algorithms employing Hi-C data to generate haplotype-resolved assemblies are particularly advantageous due to its ready availability. Existing methods primarily depend on mapping quality to filter out uninformative Hi-C alignments which may be susceptible to sequencing errors. Setting a high mapping quality threshold filters out numerous informative Hi-C alignments, whereas a low mapping quality threshold compromises the accuracy of Hi-C alignments. Maintaining high accuracy while retaining a maximum number of Hi-C alignments can be challenging.

RESULTS

In our experiments, heterozygous variations play an important role in filtering uninformative Hi-C alignments. Here, we introduce Diphase, a novel phasing tool that harnesses heterozygous variations to accurately identify the informative Hi-C alignments for phasing and to extend primary/alternate assemblies. Diphase leverages mapping quality and heterozygous variations to filter uninformative Hi-C alignments, thereby enhancing the accuracy of phasing and the detection of switches. To validate its performance, we conducted a comparative analysis of Diphase, FALCON-Phase, and GFAse on various human datasets. The results demonstrate that Diphase achieves a longer phased block N50 and exhibits higher phasing accuracy while maintaining a lower hamming error rate.

AVAILABILITY AND IMPLEMENTATION

The source code of Diphase is available at https://github.com/zhangjuncsu/Diphase.

摘要

动机

单倍型解析的基因组组装是包括基因组学、医学和泛基因组学在内的各个研究领域的重要资源。利用Hi-C数据生成单倍型解析组装的算法因其易于获取而特别具有优势。现有方法主要依赖映射质量来过滤掉可能易受测序错误影响的无信息Hi-C比对。设置高映射质量阈值会过滤掉大量有信息的Hi-C比对,而低映射质量阈值则会损害Hi-C比对的准确性。在保持高精度的同时保留最大数量的Hi-C比对可能具有挑战性。

结果

在我们的实验中,杂合变异在过滤无信息Hi-C比对中起着重要作用。在此,我们引入了Diphase,这是一种新颖的定相工具,它利用杂合变异来准确识别用于定相的有信息Hi-C比对,并扩展主要/替代组装。Diphase利用映射质量和杂合变异来过滤无信息Hi-C比对,从而提高定相的准确性和切换检测。为了验证其性能,我们在各种人类数据集上对Diphase、FALCON-Phase和GFAse进行了比较分析。结果表明,Diphase实现了更长的定相块N50,并在保持较低汉明错误率的同时表现出更高的定相准确性。

可用性和实现

Diphase的源代码可在https://github.com/zhangjuncsu/Diphase上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5c3/11663803/d3f9ec076221/btae712f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验