Shimada Makoto K, Sanbonmatsu Ryoko, Yamaguchi-Kabata Yumi, Yamasaki Chisato, Suzuki Yoshiyuki, Chakraborty Ranajit, Gojobori Takashi, Imanishi Tadashi
Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan.
National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.
Mol Genet Genomics. 2016 Oct;291(5):1851-69. doi: 10.1007/s00438-016-1219-7. Epub 2016 Jun 11.
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
短串联重复序列(STRs)由一至几个碱基对的重复序列组成。由于DNA合成过程中链滑动导致的高突变性,重复单元数量的快速进化变化根据选择压力直接塑造了重复数变异的范围。然而,剩下的问题包括:为什么导致重复序列扩增疾病的STRs在人类群体中得以保留;以及为什么这些疾病仅限于神经退行性疾病?通过使用我们构建的数据库评估全基因组对STRs的选择压力,我们确定了DNA序列和氨基酸序列之间重复数多态性的两种不同关系模式,尽管这两种模式都是避免形成有害长STRs的进化结果。首先,多聚脯氨酸(poly-P)重复序列中存在简并密码子的混合。其次,长的多聚谷氨酰胺(poly-Q)重复序列在蛋白质水平上受到青睐;然而,在DNA水平上,编码长poly-Q的STRs经常被同义单核苷酸多态性(SNPs)分隔。此外,凋亡和神经发育的显著富集是在编码具有重复多态性的poly-Q的基因中特异性发现的生物学过程。这表明多态性和/或长poly-Q延伸存在特定的分子功能。鉴于即使在健康受试者中,导致扩增疾病的poly-Q也比其他poly-Q长,我们的结果表明,长的和/或多态性的poly-Q延伸的进化益处超过了长CAG重复序列易导致病理性过度扩增的风险。神经发育中需要长的和多态性的poly-Q延伸的分子途径可能为理解为什么poly-Q扩增疾病仅限于神经退行性疾病提供线索。