Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
Department of Health Education and Promotion, Faculty of Health Sciences Tabriz University of Medical Sciences, Tabriz, Iran.
Gerontology. 2020;66(5):514-522. doi: 10.1159/000509471. Epub 2020 Sep 2.
Approximately 2% of the human core promoter short tandem repeats (STRs) reach lengths of ≥6 repeats, which may in part be a result of adaptive evolutionary processes and natural selection. A single-exon transcript of the human nescient helix loop helix 2 (NHLH2) gene is flanked by the longest CA-repeat detected in a human protein-coding gene core promoter (Ensembl transcript ID: ENST00000369506.1). NHLH2 is involved in several biological and pathological pathways, such as motivated exercise, obesity, and diabetes.
The allele and genotype distribution of the NHLH2 CA-repeat were investigated by sequencing in 655 Iranian subjects, consisting of late-onset neurocognitive disorder (NCD) as a clinical entity (n = 290) and matched controls (n = 365). The evolutionary trend of the CA-repeat was also studied across vertebrates.
The allele range was between 9 and 25 repeats in the NCD cases, and 12 and 24 repeats in the controls. At the frequency of 0.56, the 21-repeat allele was the predominant allele in the controls. While the 21-repeat was also the predominant allele in the NCD patients, we detected significant decline of the frequency (p < 0.0001) and homozygosity (p < 0.006) of this allele in this group. Furthermore, 12 genotypes were detected across 16 patients (5.5% of the entire NCD sample) and not in the controls (disease-only genotypes; p < 0.0003), consisting of at least one extreme allele. The extreme alleles were at 9, 12, 13, 18, and 19 repeats (extreme short end), and 23, 24, and 25 repeats (extreme long end), and their frequencies ranged between 0.001 and 0.04. The frequency of the 21-repeat allele significantly dropped to 0.09 in the disease-only genotype compartment (p < 0.0001). Evolutionarily, while the maximum length of the NHLH2 CA-repeat was 11 repeats in non-primates, this CA-repeat was ≥14 repeats in primates and reached maximum length in human.
We propose a novel locus for late-onset NCD at the NHLH2 core promoter exceptionally long CA-STR and natural selection at this locus. Furthermore, there was indication of genotypes at this locus that unambiguously linked to late-onset NCD. This is the first instance of natural selection in favor of a predominantly abundant STR allele in human and its differential distribution in late-onset NCD.
人类核心启动子短串联重复序列(STRs)中约有 2%的长度达到≥6 个重复,这可能部分是适应性进化过程和自然选择的结果。人类 nescient 螺旋环螺旋 2(NHLH2)基因的单外显子转录本被侧翼包围在人类蛋白质编码基因核心启动子中检测到的最长 CA-重复序列(Ensembl 转录本 ID:ENST00000369506.1)。NHLH2 参与了多种生物学和病理学途径,如运动、肥胖和糖尿病。
通过对 655 名伊朗受试者(包括迟发性神经认知障碍(NCD)作为临床实体的 290 名病例和匹配的对照组的 365 名)进行测序,研究 NHLH2 CA-重复的等位基因和基因型分布。还研究了 CA-重复在脊椎动物中的进化趋势。
NCD 病例的等位基因范围为 9 至 25 个重复,对照组为 12 至 24 个重复。在频率为 0.56 时,21 个重复等位基因是对照组中的主要等位基因。虽然 21 个重复等位基因也是 NCD 患者的主要等位基因,但我们检测到该等位基因的频率显著下降(p < 0.0001)和纯合性(p < 0.006)。此外,在 16 名患者(整个 NCD 样本的 5.5%)中检测到 12 种基因型,而在对照组中未检测到(仅疾病基因型;p < 0.0003),包括至少一个极端等位基因。极端等位基因在 9、12、13、18 和 19 个重复(极端短端)和 23、24 和 25 个重复(极端长端),其频率在 0.001 到 0.04 之间。疾病仅基因型隔室中 21 个重复等位基因的频率显著下降至 0.09(p < 0.0001)。从进化的角度来看,虽然非灵长类动物中 NHLH2 CA-重复的最大长度为 11 个重复,但在灵长类动物中,该 CA-重复大于 14 个重复,在人类中达到最大长度。
我们在 NHLH2 核心启动子中提出了一个新的迟发性 NCD 位置,该位置的 CA-STR 异常长,并且该位置存在自然选择。此外,在该位置存在明确与迟发性 NCD 相关的基因型。这是人类中有利于主要丰富 STR 等位基因的自然选择的首例,以及其在迟发性 NCD 中的差异分布。