Suppr超能文献

多平台测序技术在人类基因组组装中的基准测试。

Benchmarking multi-platform sequencing technologies for human genome assembly.

机构信息

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China.

BGI Research, Shenzhen 518083, China.

出版信息

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad300.

Abstract

Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.

摘要

基因组组装是一种计算技术,涉及将测序技术生成的脱氧核糖核酸 (DNA) 片段拼接在一起,以创建整个基因组的全面和精确表示。生成高质量的人类参考基因组是理解人类生物学的关键前提,也是下游基因组变异分析的关键。在过去的几十年中,人们使用各种先进的测序技术,努力为人类创建一个完整的、无间隙的参考基因组。有几个可用的工具旨在提高单倍体和二倍体人类基因组组装的质量,包括连续体组装、连续体错误的抛光、支架和变体相位。尽管有几项研究调查了不同组装策略的优缺点,但选择合适的工具和技术仍然是一项艰巨的任务。本文的目的是通过在基因组瓶中的两个公开样本(NA12878 和 NA24385)上结合测序技术和工具,对各种人类基因组组装策略进行基准测试。然后,我们比较了它们在连续性、准确性、完整性、变体调用和相位方面的性能。我们观察到 PacBio HiFi 长读是生成具有低碱基错误的组装的最佳选择。另一方面,我们能够使用 Oxford Nanopore 长读生成最连续的连续体,但它们可能需要进一步抛光以提高质量。我们建议使用短读而不是长读本身来提高 Oxford Nanopore 长读的连续体碱基准确性。与 10× 连接读和 Bionano 光学图谱相比,Hi-C 是染色体水平支架的最佳选择,因为它可以捕获最长范围的 DNA 连通性。然而,多种技术的组合可以用于进一步提高基因组组装的质量和完整性。对于二倍体组装,hifiasm 是使用 PacBio HiFi 和 Hi-C 数据进行人类二倍体基因组组装的最佳工具。展望未来,我们预计人类二倍体组装器的进一步发展将利用 PacBio HiFi 读长和其他具有长距离 DNA 连通性的技术的力量,以生成高质量的、染色体水平的和单倍型解析的人类基因组组装。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验