Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
IZBI, Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, 04107, Leipzig, Germany.
Sci Rep. 2022 Sep 14;12(1):15480. doi: 10.1038/s41598-022-19878-y.
The human SBF1 (SET binding factor 1) gene, alternatively known as MTMR5, is predominantly expressed in the brain, and its epigenetic dysregulation is linked to late-onset neurocognitive disorders (NCDs), such as Alzheimer's disease. This gene contains a (GCC)-repeat at the interval between + 1 and + 60 of the transcription start site (SBF1-202 ENST00000380817.8). We sequenced the SBF1 (GCC)-repeat in a sample of 542 Iranian individuals, consisting of late-onset NCDs (N = 260) and controls (N = 282). While multiple alleles were detected at this locus, the 8 and 9 repeats were predominantly abundant, forming > 95% of the allele pool across the two groups. Among a number of anomalies, the allele distribution was significantly different in the NCD group versus controls (Fisher's exact p = 0.006), primarily as a result of enrichment of the 8-repeat in the former. The genotype distribution departed from the Hardy-Weinberg principle in both groups (p < 0.001), and was significantly different between the two groups (Fisher's exact p = 0.001). We detected significantly low frequency of the 8/9 genotype in both groups, higher frequency of this genotype in the NCD group, and reverse order of 8/8 versus 9/9 genotypes in the NCD group versus controls. Biased heterozygous/heterozygous ratios were also detected for the 6/8 versus 6/9 genotypes (in favor of 6/8) across the human samples studied (Fisher's exact p = 0.0001). Bioinformatics studies revealed that the number of (GCC)-repeats may change the RNA secondary structure and interaction sites at least across human exon 1. This STR was specifically expanded beyond 2-repeats in primates. In conclusion, we report indication of a novel biological phenomenon, in which there is selection against certain heterozygous genotypes at a STR locus in human. We also report different allele and genotype distribution at this STR locus in late-onset NCD versus controls. In view of the location of this STR in the 5' untranslated region, RNA/RNA or RNA/DNA heterodimer formation of the involved genotypes and alternative RNA processing and/or translation should be considered.
人类 SBF1(SET 结合因子 1)基因,也称为 MTMR5,主要在大脑中表达,其表观遗传失调与迟发性神经认知障碍(NCD)有关,如阿尔茨海默病。该基因在转录起始位点(+1 到+60)之间的间隔处包含一个(GCC)-重复序列(SBF1-202ENST00000380817.8)。我们对来自 542 名伊朗个体的样本进行了 SBF1(GCC)-重复序列测序,这些个体包括迟发性 NCD(N=260)和对照组(N=282)。虽然在该基因座检测到多个等位基因,但 8 个和 9 个重复序列是主要的丰富序列,在两组中形成了超过 95%的等位基因池。在许多异常中,NCD 组与对照组的等位基因分布明显不同(Fisher 精确检验 p=0.006),主要是由于前者 8 个重复序列的富集。两组的基因型分布均偏离 Hardy-Weinberg 原理(p<0.001),且两组之间存在显著差异(Fisher 精确检验 p=0.001)。我们在两组中都检测到 8/9 基因型的频率显著降低,NCD 组中这种基因型的频率更高,并且在 NCD 组与对照组中,8/8 与 9/9 基因型的顺序相反。在研究的人类样本中,6/8 与 6/9 基因型的杂合/杂合比值也存在明显的偏倚(有利于 6/8)(Fisher 精确检验 p=0.0001)。生物信息学研究表明,(GCC)-重复的数量可能至少在人类外显子 1 上改变 RNA 二级结构和相互作用位点。该 STR 在灵长类动物中特异性地扩展到 2 个重复以上。总之,我们报告了一种新的生物学现象的迹象,即在人类 STR 基因座中,存在对某些杂合基因型的选择。我们还报告了在迟发性 NCD 与对照组之间,该 STR 基因座的等位基因和基因型分布不同。鉴于该 STR 位于 5'非翻译区,应考虑涉及基因型的 RNA/RNA 或 RNA/DNA 异二聚体形成以及替代 RNA 加工和/或翻译。