Suppr超能文献

基于长读长DNA测序数据的分阶段基因组组装的当前进展

Current Progress in Phased Genome Assembly from Long-Read DNA Sequencing Data.

作者信息

Diaz-Riaño Jorge Ivan, Duitama Jorge

机构信息

Systems and computing Engineering Department, Universidad de los Andes, Bogotá, Colombia.

出版信息

Methods Mol Biol. 2025;2955:51-70. doi: 10.1007/978-1-0716-4702-8_4.

Abstract

Genome assembly is a core task in the field of genomics. The availability of long-read sequencing technologies enabled the construction of high-quality complex genomes, including phasing of heterozygous contigs. This chapter provides an overview of the main algorithmic techniques for genome assembly. It starts revisiting the two main data structures used for unphased genome assembly, namely, the overlap graph and the de Bruijn graph. Then, it describes current protocols and sequencing alternatives such as trio data and Hi-C to provide long-range information for scaffolding and phasing. Next, it describes the metrics that have been developed to evaluate completeness accuracy and base pair quality. Finally, this chapter provides detailed information on the core algorithmic ideas of four different tools to perform phased genome assemblies (FALCON, HiCanu, Hifiasm, and NGSEP). A review of classic techniques for phasing based on aligned reads is included to provide a context of basic concepts needed to understand the algorithms implemented in the genome assemblers.

摘要

基因组组装是基因组学领域的一项核心任务。长读长测序技术的出现使得高质量复杂基因组的构建成为可能,包括杂合重叠群的定相。本章概述了基因组组装的主要算法技术。首先回顾用于非定相基因组组装的两种主要数据结构,即重叠图和德布鲁因图。然后,描述了当前的方案和测序替代方法,如三联体数据和Hi-C,以提供用于支架构建和定相的长程信息。接下来,介绍了为评估完整性、准确性和碱基对质量而开发的指标。最后,本章详细介绍了四种不同的进行定相基因组组装的工具(FALCON、HiCanu、Hifiasm和NGSEP)的核心算法思想。还包括对基于比对读段的经典定相技术的综述,以提供理解基因组组装器中实现的算法所需的基本概念背景。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验