Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
University of Chinese Academy of Sciences, Beijing, 100049, China.
Nat Commun. 2023 Apr 12;14(1):2092. doi: 10.1038/s41467-023-37690-8.
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
短串联重复序列(STRs)在人类基因组中丰富且高度易变。许多 STR 位点与一系列人类遗传疾病有关。然而,大多数关于人类 STR 变异的基于人群的研究都集中在欧洲血统队列上,或者受到测序深度的限制。在这里,我们描绘了一张由 6487 个深度测序基因组构建的 366013 个多态性 STR(pSTR)的综合图谱,其中包括 3983 个中国样本(31.5x,NyuWa)和来自 1000 基因组计划的 2504 个样本(33.3x,1KGP)。我们发现 STR 突变受基序长度、染色体背景和表观遗传特征的影响。我们鉴定了 3273 个和 1117 个 pSTR,它们的重复数量分别与基因表达和 3'UTR 可变多聚腺苷酸化相关。我们还进行了人群分析,研究了人群分化特征,并对 60 个已知的致病 STR 进行了基因分型。总的来说,这项研究进一步扩展了人类 STR 变异的规模,并推动了我们对 STR 语义的理解。