Mitina Aleksandra, Engchuan Worrawat, Trost Brett, Pellecchia Giovanna, Scherer Stephen W, Yuen Ryan K C
Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada.
Genome Biol. 2025 Sep 12;26(1):279. doi: 10.1186/s13059-025-03754-9.
Short tandem repeat (STR) length is a known determinant of pathogenicity in a variety of human disorders. The repeat sequence itself can modulate disease severity and penetrance; however, the broader impact of STR sequence variation on gene expression in the general population remains poorly understood.
Here, we analyze the sequence composition of STRs across two general population cohorts of unrelated individuals (n = 3,150) and report that ~ 7% of STRs exhibit sequence variability, with distinct patterns observed among different ethnic groups. These variable repeats are more prone to expansion and are frequently found in proximity to Alu elements. Notably, STRs with variable motifs are often found near splice junctions of genes involved in brain and neuronal functions. This is supported by the differential expression of genes associated with neuron and cellular projection functions, driven by the presence of distinct STR sequences.
Our findings underscore the previously unrecognized role of STR sequence variability in modulating gene expression and contributing to human phenotypic diversity.
短串联重复序列(STR)长度是多种人类疾病致病性的已知决定因素。重复序列本身可调节疾病严重程度和外显率;然而,STR序列变异对普通人群基因表达的更广泛影响仍知之甚少。
在此,我们分析了两个无关个体的普通人群队列(n = 3150)中STR的序列组成,并报告约7%的STR表现出序列变异性,不同种族群体中观察到不同模式。这些可变重复序列更容易发生扩增,且经常在Alu元件附近发现。值得注意的是,具有可变基序的STR通常在参与大脑和神经元功能的基因的剪接位点附近发现。这由与神经元和细胞投射功能相关的基因的差异表达所支持,这种差异表达由不同STR序列的存在所驱动。
我们的研究结果强调了STR序列变异性在调节基因表达和促成人类表型多样性方面此前未被认识到的作用。