Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
BMC Evol Biol. 2020 May 24;20(1):59. doi: 10.1186/s12862-020-01626-3.
Polyglutamine regions (polyQ) are one of the most studied and prevalent homorepeats in eukaryotes. They have a particular length-dependent codon usage, which relates to a characteristic CAG-slippage mechanism. Pathologically expanded tracts of polyQ are known to form aggregates and are involved in the development of several human neurodegenerative diseases. The non-pathogenic function of polyQ is to mediate protein-protein interactions via a coiled-coil pairing with an interactor. They are usually located in a helical context.
Here we study the stability of polyQ regions in evolution, using a set of 60 proteomes from four distinct taxonomic groups (Insecta, Teleostei, Sauria and Mammalia). The polyQ regions can be distinctly grouped in three categories based on their evolutionary stability: stable, unstable by length variation (inserted), and unstable by mutations (mutated). PolyQ regions in these categories can be significantly distinguished by their glutamine codon usage, and we show that the CAG-slippage mechanism is predominant in inserted polyQ of Sauria and Mammalia. The polyQ amino acid context is also influenced by the polyQ stability, with a higher proportion of proline residues around inserted polyQ. By studying the secondary structure of the sequences surrounding polyQ regions, we found that regarding the structural conformation around a polyQ, its stability category is more relevant than its taxonomic information. The protein-protein interaction capacity of a polyQ is also affected by its stability, as stable polyQ have more interactors than unstable polyQ.
Our results show that apart from the sequence of a polyQ, information about its orthologous sequences is needed to assess its function. Codon usage, amino acid context, structural conformation and the protein-protein interaction capacity of polyQ from all studied taxa critically depend on the region stability. There are however some taxa-specific polyQ features that override this importance. We conclude that a taxa-driven evolutionary analysis is of the highest importance for the comprehensive study of any feature of polyglutamine regions.
多聚谷氨酰胺区(polyQ)是真核生物中研究最多、最普遍的同源重复之一。它们具有特定的长度依赖性密码子使用,这与特征性的 CAG 滑动机制有关。病理性扩展的 polyQ 区已知会形成聚集体,并参与几种人类神经退行性疾病的发展。polyQ 的非致病性功能是通过与相互作用蛋白的卷曲螺旋配对来介导蛋白质-蛋白质相互作用。它们通常位于螺旋结构中。
在这里,我们使用来自四个不同分类群(昆虫、硬骨鱼、蜥蜴和哺乳动物)的 60 个蛋白质组来研究 polyQ 区在进化中的稳定性。根据其进化稳定性,polyQ 区可以明显分为三类:稳定、长度变化不稳定(插入)和突变不稳定(突变)。这些类别的 polyQ 区可以通过其谷氨酰胺密码子使用明显区分,我们表明 CAG 滑动机制在蜥蜴和哺乳动物的插入 polyQ 中占主导地位。polyQ 的氨基酸上下文也受到 polyQ 稳定性的影响,插入的 polyQ 周围有更高比例的脯氨酸残基。通过研究 polyQ 区域周围序列的二级结构,我们发现,就 polyQ 周围的结构构象而言,其稳定性类别比其分类学信息更相关。polyQ 的蛋白质-蛋白质相互作用能力也受到其稳定性的影响,因为稳定的 polyQ 比不稳定的 polyQ 具有更多的相互作用蛋白。
我们的结果表明,除了 polyQ 的序列外,还需要有关其同源序列的信息来评估其功能。来自所有研究分类群的 polyQ 的密码子使用、氨基酸上下文、结构构象和蛋白质-蛋白质相互作用能力都严重依赖于区域稳定性。然而,也有一些分类群特异性的 polyQ 特征会超越这种重要性。我们得出的结论是,针对任何 polyglutamine 区域的特征进行全面研究,分类群驱动的进化分析至关重要。