Suppr超能文献

多聚谷氨酰胺区域研究中定义的重要性:阈值、杂质与序列背景的故事

The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context.

作者信息

Mier Pablo, Elena-Real Carlos, Urbanek Annika, Bernadó Pau, Andrade-Navarro Miguel A

机构信息

Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany.

Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France.

出版信息

Comput Struct Biotechnol J. 2020 Feb 4;18:306-313. doi: 10.1016/j.csbj.2020.01.012. eCollection 2020.

Abstract

Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.

摘要

聚谷氨酰胺(polyQ)区域是真核生物中最普遍的同聚物重复序列之一。然而,由于各种研究得出了不同的结果,因此很难评估它们的普遍性。原因是缺乏对究竟什么是polyQ区域的共识定义。我们通过研究使用不同阈值(即在给定大小的蛋白质区域中所需的谷氨酰胺的最小数量)来检测人类蛋白质组中的polyQ区域如何不仅影响其普遍性,还影响其一般特征和序列背景,解决了这个问题。阈值定义塑造了polyQ数据集的长度分布,并改变了polyQ区域内杂质(除谷氨酰胺以外的氨基酸)的观察数量和位置。无论选择何种阈值,亮氨酸和脯氨酸残基在polyQ内部和周围均富集。虽然亮氨酸在polyQ的N端富集,特别是在位置-1(polyQ之前的氨基酸),脯氨酸在C端(位置+1至+5,即polyQ之后的前五个氨基酸)普遍存在。我们还检查了这些阈值对其他物种的适用性,并将它们的polyQ特征与在人类中发现的特征进行了比较。由于polyQ区域的序列背景和特征取决于阈值,我们提出了一种快速扫描蛋白质组的polyQ格局的方法。我们用一个总结性概述来补充我们的结果,该概述说明了在研究polyQ区域时每个阈值预期会有哪些偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5158/7016039/1d672195e276/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验