长读序列和串联重复序列的组装。

Long-read sequence and assembly of segmental duplications.

机构信息

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.

The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, MO, USA.

出版信息

Nat Methods. 2019 Jan;16(1):88-94. doi: 10.1038/s41592-018-0236-3. Epub 2018 Dec 17.

DOI:10.1038/s41592-018-0236-3

PMID:30559433

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6382464/

Abstract

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

摘要

我们开发了一种基于长序列读段多倍体相位的计算方法，以解决基因组组装中片段重复区域的坍塌问题。片段重复组装器（SDA；https://github.com/mvollger/SDA）构建了一个图谱，其中同源序列变体定义节点，长读序列提供吸引和排斥边缘，从而能够对对应于不同同源的长读进行分区和组装。我们将其应用于来自三个人类基因组的单分子实时序列数据，并恢复了 33-79 兆碱基对（Mb）的重复序列，其中大约一半的基因座与参考基因组相比存在差异（<99.8%）。我们表明，相应的序列具有高度的准确性（>99.9%），并且与参考基因组中不存在的拷贝数可变同源物相对应。我们的方法可以应用于其他复杂的基因组，以解决最后富含基因的缺口，改进重复基因注释，并更好地理解碱基对水平的拷贝数变异遗传多样性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/020c/6382464/0a934ca41690/nihms-1511761-f0001.jpg

相似文献

Long-read sequence and assembly of segmental duplications.

Nat Methods. 2019 Jan;16(1):88-94. doi: 10.1038/s41592-018-0236-3. Epub 2018 Dec 17.

Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.

Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.

Transcriptional fates of human-specific segmental duplications in brain.

Genome Res. 2018 Oct;28(10):1566-1576. doi: 10.1101/gr.237610.118. Epub 2018 Sep 18.

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.

Genome Biol. 2003;4(4):R25. doi: 10.1186/gb-2003-4-4-r25. Epub 2003 Mar 17.

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.

Structural polymorphism and diversity of human segmental duplications.

Nat Genet. 2025 Feb;57(2):390-401. doi: 10.1038/s41588-024-02051-8. Epub 2025 Jan 8.

NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads.

BMC Bioinformatics. 2022 Dec 16;23(1):545. doi: 10.1186/s12859-022-05081-3.

Genome-wide profiling of highly similar paralogous genes using HiFi sequencing.

Nat Commun. 2025 Mar 8;16(1):2340. doi: 10.1038/s41467-025-57505-2.

Reconstruction of Segmental Duplication Rates and Associated Genomic Features by Network Analysis.

Genome Biol Evol. 2025 Mar 6;17(3). doi: 10.1093/gbe/evaf011.

Analysis of segmental duplications and genome assembly in the mouse.

Genome Res. 2004 May;14(5):789-801. doi: 10.1101/gr.2238404.

引用本文的文献

The reference genome of the human diploid cell line RPE-1.

Nat Commun. 2025 Sep 12;16(1):7751. doi: 10.1038/s41467-025-62428-z.

A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism.

Nat Genet. 2025 Aug 19. doi: 10.1038/s41588-025-02293-0.

Segmental duplication-mediated rearrangements alter the landscape of mouse genomes.

bioRxiv. 2025 Jul 22:2025.07.18.665526. doi: 10.1101/2025.07.18.665526.

Complex genetic variation in nearly complete human genomes.

Nature. 2025 Jul 23. doi: 10.1038/s41586-025-09140-6.

Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing.

Nat Genet. 2025 Jul 22. doi: 10.1038/s41588-025-02258-3.

A global map for introgressed structural variation and selection in humans.

bioRxiv. 2025 Jun 24:2025.06.24.661368. doi: 10.1101/2025.06.24.661368.

Population differences of chromosome 22q11.2 duplication structure predispose differentially to microdeletion and inversion.

bioRxiv. 2025 Jul 7:2025.07.04.662981. doi: 10.1101/2025.07.04.662981.

Verkko2 integrates proximity-ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding.

Genome Res. 2025 Jun 12. doi: 10.1101/gr.280383.124.

Human de novo mutation rates from a four-generation pedigree reference.

Nature. 2025 Apr 23. doi: 10.1038/s41586-025-08922-2.

Addressing missing context in regulatory variation across primate evolution.

ArXiv. 2025 Apr 2:arXiv:2504.02081v1.

本文引用的文献

A fast adaptive algorithm for computing whole-genome homology maps.

Bioinformatics. 2018 Sep 1;34(17):i748-i756. doi: 10.1093/bioinformatics/bty597.

Transcriptional fates of human-specific segmental duplications in brain.

Genome Res. 2018 Oct;28(10):1566-1576. doi: 10.1101/gr.237610.118. Epub 2018 Sep 18.

High-resolution comparative analysis of great ape genomes.

Science. 2018 Jun 8;360(6393). doi: 10.1126/science.aar6343.

Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis.

Cell. 2018 May 31;173(6):1356-1369.e22. doi: 10.1016/j.cell.2018.03.051.

Minimap2: pairwise alignment for nucleotide sequences.

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex.

Elife. 2018 Mar 21;7:e32332. doi: 10.7554/eLife.32332.

Nanopore sequencing and assembly of a human genome with ultra-long reads.

Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.

Resolving multicopy duplications using polyploid phasing.

Res Comput Mol Biol. 2017 May;10229:117-133. doi: 10.1007/978-3-319-56970-3_8. Epub 2017 Apr 12.

The evolution and population diversity of human-specific segmental duplications.

Nat Ecol Evol. 2017;1(3):69. doi: 10.1038/s41559-016-0069. Epub 2017 Feb 17.

Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation.

Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

长读序列和串联重复序列的组装。

Long-read sequence and assembly of segmental duplications.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献