Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.
Cell Rep Methods. 2023 Aug 2;3(8):100543. doi: 10.1016/j.crmeth.2023.100543. eCollection 2023 Aug 28.
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.
人类泛基因组是一个新的参考序列,解决了当前 GRCh38 参考序列的许多限制。第一个版本是基于 94 个来自不同背景的个体的高质量单体型组装。我们采用了 k-mer 索引策略,对多个组装体进行了比较分析,包括泛基因组参考序列、GRCh38 和 CHM13(一个端粒到端粒的参考组装体)。我们的 k-mer 索引方法使我们能够识别出所有组装体中普遍保守的序列的有价值的集合,称为“泛保守片段标记”(PSTs)。通过检查这些片段之间的间隔,我们辨别出高度保守的基因组片段和具有结构相关多态性的片段。我们在泛基因组参考中发现了 60764 个具有独特地理种族特征的多态性间隔。在这项研究中,我们利用超保守序列(PSTs)在人类泛基因组组装体和参考基因组之间建立了联系。这种方法可以利用参考基因组作为比较框架,检查泛基因组中任何感兴趣的序列。