Rady Children's Institute for Genomic Medicine, San Diego, CA, United States of America.
Cedars-Sinai Medical Center, Los Angeles, CA, United States of America.
PLoS One. 2023 Jan 26;18(1):e0279430. doi: 10.1371/journal.pone.0279430. eCollection 2023.
Short Tandem Repeats (STRs) have been found to play a role in a myriad of complex traits and genetic diseases. We examined the variability in the lengths of over 850,000 STR loci in 996 children with suspected genetic disorders and 1,178 parents across six separate ancestral groups: Africans, Europeans, East Asians, Admixed Americans, Non-admixed Americans, and Pacific Islanders. For each STR locus we compared allele length between and within each ancestry group. In relation to Europeans, admixed Americans had the most similar STR lengths with only 623 positions either significantly expanded or contracted, while the divergence was highest in Africans, with 4,933 chromosomal positions contracted or expanded. We also examined probands to identify STR expansions at known pathogenic loci. The genes TCF4, AR, and DMPK showed significant expansions with lengths 250% greater than their various average allele lengths in 49, 162, and 11 individuals respectively. All 49 individuals containing an expansion in TCF4 and six individuals containing an expansion in DMPK presented with allele lengths longer than the known pathogenic length for these genes. Next, we identified individuals with significant expansions in highly conserved loci across all ancestries. Eighty loci in conserved regions met criteria for divergence. Two of these individuals were found to have exonic STR expansions: one in ZBTB4 and the other in SLC9A7, which is associated with X-linked mental retardation. Finally, we used parent-child trios to detect and analyze de novo mutations. In total, we observed 3,219 de novo expansions, where proband allele lengths are greater than twice the longest parental allele length. This work helps lay the foundation for understanding STR lengths genome-wide across ancestries and may help identify new disease genes and novel mechanisms of pathogenicity in known disease genes.
短串联重复序列(STRs)已被发现与多种复杂特征和遗传疾病有关。我们检查了 996 名疑似遗传疾病儿童和 1178 名父母的超过 85 万个 STR 基因座的长度变化,这些父母来自六个不同的祖先群体:非洲人、欧洲人、东亚人、混合美国人、非混合美国人以及太平洋岛民。对于每个 STR 基因座,我们比较了每个祖先群体内和群体间的等位基因长度。与欧洲人相比,混合美国人的 STR 长度最相似,只有 623 个位置明显扩张或收缩,而非洲人的差异最大,有 4933 个染色体位置收缩或扩张。我们还检查了先证者,以确定已知致病性基因座的 STR 扩张。基因 TCF4、AR 和 DMPK 的长度分别比其各自的平均等位基因长度长 250%,显示出显著扩张,分别在 49、162 和 11 个人中。所有包含 TCF4 扩张的 49 个人和包含 DMPK 扩张的 6 个人的等位基因长度都比这些基因的已知致病性长度长。接下来,我们在所有祖先的高度保守基因座中确定了具有显著扩张的个体。在保守区域的 80 个基因座符合差异标准。这两个人中的两个人被发现具有外显子 STR 扩张:一个在 ZBTB4 中,另一个在 SLC9A7 中,后者与 X 连锁智力低下有关。最后,我们使用父母-子女三对来检测和分析从头突变。总共观察到 3219 个新的扩张,其中先证者的等位基因长度大于最长亲本等位基因长度的两倍。这项工作为理解全基因组范围内不同祖先的 STR 长度奠定了基础,并可能有助于鉴定新的疾病基因和已知疾病基因中致病性的新机制。