Suppr超能文献

单体型解析的多样化人类基因组和结构变异的综合分析。

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

机构信息

Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.

Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.

出版信息

Science. 2021 Apr 2;372(6537). doi: 10.1126/science.abf7117. Epub 2021 Feb 25.

Abstract

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

摘要

长读长和单链测序技术结合在一起,可以在没有父母-子女三核苷酸数据的情况下从头组装高质量的单倍型解析人类基因组。我们展示了 32 个不同人类基因组中 64 个组装的单倍型。这些高度连续的单倍型组装体(覆盖基因组 50%所需的最小连续体长度:2600 万碱基对)整合了所有形式的遗传变异,甚至跨越复杂的基因座。我们鉴定了 107590 个结构变异(SVs),其中 68%是无法通过短读测序发现的,还有 278 个 SV 热点(跨越大片富含基因的序列)。我们对 130 个最活跃的移动元件源元件进行了特征分析,发现所有 SVs 中有 63%是通过同源介导机制产生的。该资源可以从多达 50340 个 SV 的短读中进行可靠的基于图的基因分型,从而鉴定出 1526 个表达数量性状基因座,以及人群中适应性选择的 SV 候选基因座。

相似文献

1
Haplotype-resolved diverse human genomes and integrated analysis of structural variation.
Science. 2021 Apr 2;372(6537). doi: 10.1126/science.abf7117. Epub 2021 Feb 25.
2
Multi-platform discovery of haplotype-resolved structural variation in human genomes.
Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.
3
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.
4
NovoGraph: Human genome graph construction from multiple long-read assemblies.
F1000Res. 2018 Sep 3;7:1391. doi: 10.12688/f1000research.15895.2. eCollection 2018.
5
Large indel detection in region-based phased diploid assemblies from linked-reads.
BMC Genomics. 2025 Mar 18;26(Suppl 2):263. doi: 10.1186/s12864-025-11398-z.
6
Robust Benchmark Structural Variant Calls of An Asian Using State-of-the-art Long-read Sequencing Technologies.
Genomics Proteomics Bioinformatics. 2022 Feb;20(1):192-204. doi: 10.1016/j.gpb.2020.10.006. Epub 2021 Mar 2.
9
Sawfish: improving long-read structural variant discovery and genotyping with local haplotype modeling.
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf136.

引用本文的文献

1
Multifocal Genomic Reconstruction Leading to Germline Structural Variants.
Methods Mol Biol. 2025;2968:509-520. doi: 10.1007/978-1-0716-4750-9_30.
2
Structural Variants: Mechanisms, Mapping, and Interpretation in Human Genetics.
Genes (Basel). 2025 Jul 29;16(8):905. doi: 10.3390/genes16080905.
3
SV-MeCa: an XGBoost-based meta-caller approach for structural variant calling from short-read data.
BMC Bioinformatics. 2025 Aug 20;26(1):218. doi: 10.1186/s12859-025-06246-6.
8
Mechanism of parent-of-origin effects revealed by multi-omic data in euro-chinese hybrid pigs.
Nat Commun. 2025 Aug 14;16(1):7542. doi: 10.1038/s41467-025-62243-6.
10
Pangenome discovery of missing autism variants.
medRxiv. 2025 Jul 22:2025.07.21.25331932. doi: 10.1101/2025.07.21.25331932.

本文引用的文献

1
Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.
Nat Commun. 2021 Jul 12;12(1):4250. doi: 10.1038/s41467-021-24378-0.
2
lra: A long read aligner for sequences and contigs.
PLoS Comput Biol. 2021 Jun 21;17(6):e1009078. doi: 10.1371/journal.pcbi.1009078. eCollection 2021 Jun.
3
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.
Nat Methods. 2021 Feb;18(2):170-175. doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.
4
SVIM-asm: structural variant detection from haploid and diploid genome assemblies.
Bioinformatics. 2021 Apr 1;36(22-23):5519-5521. doi: 10.1093/bioinformatics/btaa1034.
5
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.
Nat Biotechnol. 2021 Mar;39(3):302-308. doi: 10.1038/s41587-020-0719-5. Epub 2020 Dec 7.
6
Chromosome-scale, haplotype-resolved assembly of human genomes.
Nat Biotechnol. 2021 Mar;39(3):309-312. doi: 10.1038/s41587-020-0711-0. Epub 2020 Dec 7.
7
Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.
Genome Biol. 2020 Sep 17;21(1):249. doi: 10.1186/s13059-020-02135-8.
8
The GTEx Consortium atlas of genetic regulatory effects across human tissues.
Science. 2020 Sep 11;369(6509):1318-1330. doi: 10.1126/science.aaz1776.
9
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.
10
Population-scale proteome variation in human induced pluripotent stem cells.
Elife. 2020 Aug 10;9:e57390. doi: 10.7554/eLife.57390.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验