Suppr超能文献

通过低复杂度三角形评估蛋白质序列的低复杂度。

Assessing the low complexity of protein sequences via the low complexity triangle.

机构信息

Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany.

出版信息

PLoS One. 2020 Dec 30;15(12):e0239154. doi: 10.1371/journal.pone.0239154. eCollection 2020.

Abstract

BACKGROUND

Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat.

RESULTS

We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called 'low complexity triangle' as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest.

CONCLUSIONS

The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.

摘要

背景

具有低复杂度区域 (LCR) 的蛋白质具有非典型的序列和结构特征。它们的氨基酸组成与预期的、基于整个蛋白质组确定的组成不同,并且它们不符合在球状区域中普遍存在的结构折叠规则。一种描述这些区域的方法是评估序列的可重复性,即计算该区域成为重复部分的局部倾向。

结果

我们结合了两种低复杂度的局部度量方法,即可重复性(使用 RES 算法)和最常见氨基酸的分数,来评估不同的蛋白质组、具有特定特征的蛋白质区域数据集以及具有极端组成的单个蛋白质。我们应用一种称为“低复杂度三角形”的表示形式作为概念验证来表示测量的低复杂度值。结果表明,蛋白质组在低复杂度三角形中具有独特的特征,并且这些特征与序列的复杂度特征相关。我们开发了一个名为 LCT(http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/)的网络工具,允许用户计算给定蛋白质或感兴趣区域的低复杂度三角形。

结论

低复杂度三角形被证明是表示序列或蛋白质数据集一般低复杂度的合适程序。同源重复、异源重复、组成性偏向区域和球状区域在三角形中占据特征位置。所描述的流水线可用于表征 LCR,并有助于量化蛋白质和蛋白质组中退化串联重复的含量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e836/7773278/ba0b4dd09bd4/pone.0239154.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验