Department of Computer Science, Faculty of Exact Sciences, University of Bejaia, 06000 Bejaia, Algeria.
Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.
J Struct Biol. 2019 Nov 1;208(2):86-91. doi: 10.1016/j.jsb.2019.08.003. Epub 2019 Aug 10.
Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.
蛋白质序列中的低复杂度区域(LCRs)具有特殊的性质,与球状蛋白非常不同。当氨基酸的分布出现偏向时,定义二级结构元件的规则不再适用。虽然 LCRs 中存在结构无序的趋势,但各种例子,特别是单个氨基酸的同源重复,表明非常短的重复可能会采用非常难以预测的结构。这些结构可能是可变的,并取决于分子内或分子间相互作用的上下文。一般来说,LCRs 中的短重复可以诱导结构。这可以解释这样一个观察结果,即非常短(非完美)的重复广泛存在,并且许多重复定义了蛋白质相互作用中的功能区域。出于这些原因,我们开发了一种算法来快速分析蛋白质序列中的局部重复性,即蛋白质片段与完美重复的接近程度。使用此算法,我们确定酵母 Saccharomyces cerevisiae 的蛋白质中缺乏奇数长度的近似或非完美短重复,而人类蛋白质则没有,鱼类 Danio rerio 有许多长度为 2 的重复蛋白,而植物 Arabidopsis thaliana 则有异常大量的长度为 7 的重复。我们的方法(REpeatability Scanner,RES,可在 http://cbdm-01.zdv.uni-mainz.de/~munoz/res/ 上获得)允许在蛋白质序列中找到具有近似短重复的区域,并有助于描述不同生物体中 LCRs 的可变使用和组成偏向。