Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.
Illumina Inc., San Diego, CA, USA.
Sci Data. 2020 Sep 8;7(1):294. doi: 10.1038/s41597-020-00633-9.
Significant progress has been made in elucidating single nucleotide polymorphism diversity in the human population. However, the majority of the variation space in the genome is structural and remains partially elusive. One form of structural variation is tandem repeats (TRs). Expansion of TRs are responsible for over 40 diseases, but we hypothesize these represent only a fraction of the pathogenic repeat expansions that exist. Here we characterize long or expanded TR variation in 1,115 human genomes as well as a replication cohort of 2,504 genomes, identified using ExpansionHunter Denovo. We found that individual genomes typically harbor several rare, large TRs, generally in non-coding regions of the genome. We noticed that these large TRs are enriched in their proximity to Alu elements. The vast majority of these large TRs seem to be expansions of smaller TRs that are already present in the reference genome. We are providing this TR profile as a resource for comparison to undiagnosed rare disease genomes in order to detect novel disease-causing repeat expansions.
在阐明人类群体中单核苷酸多态性多样性方面已经取得了重大进展。然而,基因组中的大部分变异空间是结构性的,仍然部分难以捉摸。结构变异的一种形式是串联重复(TR)。TR 的扩张负责超过 40 种疾病,但我们假设这些只是存在的致病性重复扩张的一小部分。在这里,我们使用 ExpansionHunter Denovo 对 1115 个人类基因组和 2504 个基因组的复制队列进行了长或扩展 TR 变异的特征描述。我们发现,个体基因组通常携带几个罕见的、大的 TR,通常位于基因组的非编码区域。我们注意到,这些大的 TR 在它们与 Alu 元件的接近度上是富集的。这些大的 TR 中的绝大多数似乎是已经存在于参考基因组中的较小 TR 的扩张。我们提供这个 TR 图谱作为一个资源,以便与未诊断的罕见疾病基因组进行比较,以检测新的致病重复扩张。