Hiatt Laurel, Weisburd Ben, Dolzhenko Egor, VanNoy Grace E, Kurtas Edibe Nehir, Rehm Heidi L, Quinlan Aaron, Dashnow Harriet
medRxiv. 2024 May 21:2024.05.21.24307682. doi: 10.1101/2024.05.21.24307682.
Approximately 3% of the human genome consists of repetitive elements called tandem repeats (TRs), which include short tandem repeats (STRs) of 1-6bp motifs and variable number tandem repeats (VNTRs) of 7+bp motifs. TR variants contribute to several dozen mono- and polygenic diseases but remain understudied and "enigmatic," particularly relative to single nucleotide variants. It remains comparatively challenging to interpret the clinical significance of TR variants. Although existing resources provide portions of necessary data for interpretation at disease-associated loci, it is currently difficult or impossible to efficiently invoke the additional details critical to proper interpretation, such as motif pathogenicity, disease penetrance, and age of onset distributions. It is also often unclear how to apply population information to analyses. We present STRchive (S-T-archive, http://strchive.org/ ), a dynamic resource consolidating information on TR disease loci in humans from research literature, up-to-date clinical resources, and large-scale genomic databases, with the goal of streamlining TR variant interpretation at disease-associated loci. We apply STRchive -including pathogenic thresholds, motif classification, and clinical phenotypes-to a gnomAD cohort of ∼18.5k individuals genotyped at 60 disease-associated loci. Through detailed literature curation, we demonstrate that the majority of TR diseases affect children despite being thought of as adult diseases. Additionally, we show that pathogenic genotypes can be found within gnomAD which do not necessarily overlap with known disease prevalence, and leverage STRchive to interpret locus-specific findings therein. We apply a diagnostic blueprint empowered by STRchive to relevant clinical vignettes, highlighting possible pitfalls in TR variant interpretation. As a living resource, STRchive is maintained by experts, takes community contributions, and will evolve as understanding of TR diseases progresses.
人类基因组中约3%由称为串联重复序列(TRs)的重复元件组成,其中包括1 - 6个碱基基序的短串联重复序列(STRs)和7个及以上碱基基序的可变数目串联重复序列(VNTRs)。TR变异与几十种单基因和多基因疾病有关,但仍未得到充分研究且“神秘莫测”,尤其是相对于单核苷酸变异而言。解释TR变异的临床意义仍然相对具有挑战性。尽管现有资源提供了疾病相关位点解释所需的部分数据,但目前很难或无法有效地调用对正确解释至关重要的其他细节,如基序致病性、疾病外显率和发病年龄分布。通常也不清楚如何将群体信息应用于分析。我们展示了STRchive(S - T - 存档,http://strchive.org/),这是一个动态资源库,整合了来自研究文献、最新临床资源和大规模基因组数据库的人类TR疾病位点信息,目的是简化疾病相关位点的TR变异解释。我们将STRchive——包括致病阈值、基序分类和临床表型——应用于在60个疾病相关位点进行基因分型的约18500名个体的gnomAD队列。通过详细的文献整理,我们证明大多数TR疾病尽管被认为是成人疾病,但实际上影响儿童。此外,我们表明在gnomAD中可以发现致病基因型,这些基因型不一定与已知疾病患病率重叠,并利用STRchive来解释其中位点特异性的发现。我们将由STRchive支持的诊断蓝图应用于相关临床案例,突出了TR变异解释中可能存在的陷阱。作为一个动态资源库,STRchive由专家维护,接受社区贡献,并将随着对TR疾病理解的进展而不断发展。