Suppr超能文献

人类基因组中简单卫星变异的结构及其与着丝粒祖先的关系。

The Structure of Simple Satellite Variation in the Human Genome and Its Correlation With Centromere Ancestry.

机构信息

Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.

出版信息

Genome Biol Evol. 2024 Aug 5;16(8). doi: 10.1093/gbe/evae153.

Abstract

Although repetitive DNA forms much of the human genome, its study is challenging due to limitations in assembly and alignment of repetitive short-reads. We have deployed k-Seek, software that detects tandem repeats embedded in single reads, on 2,504 human genomes from the 1,000 Genomes Project to quantify the variation and abundance of simple satellites (repeat units <20 bp). We find that the ancestral monomer of Human Satellite 3 makes up the largest portion of simple satellite content in humans (mean of ∼8 Mb). We discovered ∼50,000 rare tandem repeats that are not detected in the T2T-CHM13v2.0 assembly, including undescribed variants of telomericand pericentromeric repeats. We find broad homogeneity of the most abundant repeats across populations, except for AG-rich repeats which are more abundant in African individuals. We also find cliques of highly similar AG- and AT-rich satellites that are interspersed and form higher-order structures that covary in copy number across individuals, likely through concerted amplification via unequal exchange. Finally, we use pericentromeric polymorphisms to estimate centromeric genetic relatedness between individuals and find a strong predictive relationship between centromeric lineages and pericentromeric simple satellite abundances. In particular, ancestral monomers of Human Satellite 2 and Human Satellite 3 abundances correlate with clusters of centromeric ancestry on chromosome 16 and chromosome 9, with some clusters structured by population. These results provide new descriptions of the population dynamics that underlie the evolution of simple satellites in humans.

摘要

尽管重复 DNA 构成了人类基因组的大部分,但由于重复短读的组装和对齐限制,对其进行研究具有挑战性。我们在来自 1000 基因组计划的 2504 个人类基因组上部署了 k-Seek,这是一种检测单读中嵌入的串联重复的软件,以量化简单卫星(重复单元<20bp)的变异和丰度。我们发现人类卫星 3 的祖先单体构成了人类简单卫星含量的最大部分(平均值约为 8 Mb)。我们发现了约 50000 个在 T2T-CHM13v2.0 组装中未检测到的罕见串联重复,包括端粒和着丝粒重复的未描述变体。我们发现最丰富的重复在人群中具有广泛的同质性,除了富含 AG 的重复在非洲个体中更为丰富。我们还发现高度相似的 AG 和 AT 丰富的卫星群相互交错,并形成高阶结构,在个体之间的拷贝数上协同变化,可能通过非均等交换的协同扩增。最后,我们使用着丝粒多态性来估计个体之间的着丝粒遗传相关性,并发现着丝粒谱系和着丝粒简单卫星丰度之间存在很强的预测关系。特别是,人类卫星 2 和人类卫星 3 的祖先单体丰度与 16 号和 9 号染色体着丝粒祖先簇相关,一些簇由群体结构。这些结果提供了对简单卫星在人类进化中所基于的群体动态的新描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0556/11305138/16ba1e00ac61/evae153f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验