追求完美：端粒到端粒基因组组装的验证和优化策略。

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

机构信息

Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA.

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.

出版信息

Nat Methods. 2022 Jun;19(6):687-695. doi: 10.1038/s41592-022-01440-3. Epub 2022 Mar 31.

DOI:10.1038/s41592-022-01440-3

PMID:35361931

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9812399/

Abstract

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

摘要

长读测序技术和基因组组装方法的进步使得最近首次完成了端粒到端粒的人类基因组组装，解决了复杂的片段重复和大型串联重复，包括完整葡萄胎（CHM13）中的着丝粒卫星阵列。尽管源自高度准确的序列，但评估结果显示初始草案组装中存在小错误和结构组装错误的证据。为了纠正这些错误，我们设计了一种新的重复感知的抛光策略，该策略可以在不过度校正的情况下在大型重复中进行准确的组装校正，最终纠正了 51%的现有错误，并将组装质量值从 PacBio 高保真度和 Illumina k-mer 测量的 70.2 提高到 73.9。通过将我们的结果与标准自动化抛光工具进行比较，我们概述了常见的抛光错误，并为资源有限的基因组项目提供了实用建议。我们还展示了高保真度和 Oxford Nanopore Technologies 读段中的测序偏差如何导致可以通过多种测序技术纠正的特征性组装错误。

相似文献

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Nat Methods. 2022 Jun;19(6):687-695. doi: 10.1038/s41592-022-01440-3. Epub 2022 Mar 31.

Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.

Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.

NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads.

BMC Bioinformatics. 2022 Dec 16;23(1):545. doi: 10.1186/s12859-022-05081-3.

Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly.

Nat Methods. 2024 Apr;21(4):574-583. doi: 10.1038/s41592-023-02141-1. Epub 2024 Mar 8.

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.

NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.

Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.

How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies.

Microb Genom. 2024 Jun;10(6). doi: 10.1099/mgen.0.001254.

Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing.

BMC Genomics. 2019 Jan 9;20(1):23. doi: 10.1186/s12864-018-5381-7.

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.

Nanopore ultra-long sequencing and adaptive sampling spur plant complete telomere-to-telomere genome assembly.

Mol Plant. 2024 Nov 4;17(11):1773-1786. doi: 10.1016/j.molp.2024.10.008. Epub 2024 Oct 16.

引用本文的文献

Benchmarking of bioinformatics tools for the hybrid assembly of human and non-human whole-genome sequencing data.

Comput Struct Biotechnol J. 2025 Jul 13;27:3099-3109. doi: 10.1016/j.csbj.2025.07.020. eCollection 2025.

A telomere-to-telomere gap-free genome assembly of the endangered humphead wrasse (Cheilinus undulatus).

Sci Data. 2025 Jul 11;12(1):1194. doi: 10.1038/s41597-025-05475-x.

Highly accurate assembly polishing with DeepPolisher.

Genome Res. 2025 Jul 1;35(7):1595-1608. doi: 10.1101/gr.280149.124.

Verkko2 integrates proximity-ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding.

Genome Res. 2025 Jun 12. doi: 10.1101/gr.280383.124.

Telomere-to-telomere genome of common bean (Phaseolus vulgaris L., YP4).

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf001.

Large tandem repeats of grass frog (Rana temporaria) in silico and in situ.

BMC Genomics. 2025 May 6;26(1):445. doi: 10.1186/s12864-025-11643-5.

Establishing genome sequencing and assembly for non-model and emerging model organisms: a brief guide.

Front Zool. 2025 Apr 17;22(1):7. doi: 10.1186/s12983-025-00561-7.

Host-encoded DNA methyltransferases modify the epigenome and host tropism of invading phages.

iScience. 2025 Mar 22;28(4):112264. doi: 10.1016/j.isci.2025.112264. eCollection 2025 Apr 18.

A pangenome reveals LTR repeat dynamics as a major driver of genome evolution in Chenopodium.

Plant Genome. 2025 Mar;18(1):e70010. doi: 10.1002/tpg2.70010.

A telomere-to-telomere genome assembly of the protandrous hermaphrodite blackhead seabream, Acanthopagrus schlegelii.

Sci Data. 2025 Feb 27;12(1):350. doi: 10.1038/s41597-025-04602-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

追求完美：端粒到端粒基因组组装的验证和优化策略。

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

机构信息

Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH, Bethesda, MD, USA.

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.

出版信息

Nat Methods. 2022 Jun;19(6):687-695. doi: 10.1038/s41592-022-01440-3. Epub 2022 Mar 31.

DOI:10.1038/s41592-022-01440-3

PMID:35361931

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9812399/

Abstract

摘要

追求完美：端粒到端粒基因组组装的验证和优化策略。

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

追求完美：端粒到端粒基因组组装的验证和优化策略。

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

机构信息

出版信息