Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11.
Phasing is essential for determining the origins of each set of alleles in the whole-genome sequencing data of individuals. As such, it provides essential information for the causes of hereditary diseases and the sources of individual variability. Recent technical breakthroughs in linked-read (referred to as co-barcoding in other chapters of the book) and long-read sequencing and downstream analysis have brought the goal of accurate and complete phasing within reach. Here we review recent progress related to the assembly and phasing of personal genomes based on linked-reads and related applications. Motivated by current limitations in generating high-quality diploid assemblies and detecting variants, a new suite of software tools, Aquila, was developed to fully take advantage of linked-read sequencing technology. The overarching goal of Aquila is to exploit the strengths of linked-read technology including long-range connectivity and inherent phasing of variants for reference-assisted local de novo assembly at the whole-genome scale. The diploid nature of the assemblies facilitates detection and phasing of genetic variation, including single nucleotide variations (SNVs), small insertions and deletions (indels), and structural variants (SVs). An extension of Aquila, Aquila_stLFR, focuses on another newly developed linked-reads sequencing technology, single-tube long-fragment read (stLFR). AquilaSV, a region-based diploid assembly approach, is used to characterize structural variants and can achieve diploid assembly in one target region at a time. Lastly, we introduce HAPDeNovo, a program that exploits phasing information from linked-read sequencing to improve detection of de novo mutations. Use of these tools is expected to harness the advantages of linked-reads technology, improve phasing, and advance variant discovery.
相位确定对于确定个体全基因组测序数据中每一组等位基因的起源至关重要。因此,它为遗传性疾病的原因和个体变异的来源提供了重要信息。近年来,链接读取(本书其他章节中称为共条形码)和长读取测序以及下游分析技术的突破,使得准确和完整的相位确定成为可能。在这里,我们回顾了基于链接读取的个人基因组组装和相位确定的最新进展及其相关应用。由于当前在生成高质量的二倍体组装和检测变体方面存在局限性,因此开发了一套新的软件工具 Aquila,以充分利用链接读取测序技术。Aquila 的总体目标是利用链接读取技术的优势,包括长程连接性和变体的固有相位,在全基因组范围内进行参考辅助的局部从头组装。组装的二倍体性质有利于检测和相位遗传变异,包括单核苷酸变异(SNVs)、小插入和缺失(indels)以及结构变异(SVs)。Aquila 的扩展版本 Aquila_stLFR 专注于另一种新开发的链接读取测序技术,即单管长片段读取(stLFR)。AquilaSV 是一种基于区域的二倍体组装方法,用于表征结构变异,并且可以一次在一个目标区域实现二倍体组装。最后,我们介绍了 HAPDeNovo,这是一个利用链接读取测序中的相位信息来提高从头突变检测的程序。预计这些工具的使用将利用链接读取技术的优势,提高相位确定,并推进变异发现。