Basu Sankar, Söderquist Fredrik, Wallner Björn
Bioinformatics Division, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden.
Department of Biochemistry, University of Calcutta, Kolkata, 700019, India.
J Comput Aided Mol Des. 2017 May;31(5):453-466. doi: 10.1007/s10822-017-0020-y. Epub 2017 Apr 1.
The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents 'Proteus', a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible 'disorder-to-order' transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.
在过去的十五年半里,计算结构生物学界的关注焦点发生了巨大转变,从经典的蛋白质结构预测问题转向了对内在无序蛋白质(IDP)或含有无序区域的蛋白质(IDPR)的可能理解。当前的兴趣在于揭示嵌入在IDP/IDPR氨基酸序列中的无序到有序转变密码。无序蛋白质的特征是具有大量的结构可塑性,这使得它们在与不同伙伴结合时具有混杂性,在细胞活动中具有多功能性,并且在折叠能量景观方面是非典型的,类似于部分折叠的熔球。此外,它们与几种致命的人类疾病(如癌症、心血管疾病和神经退行性疾病)有关,这使得它们成为有吸引力的药物靶点,并且对于从生物化学角度理解这些疾病很重要。对IDP结构集合的研究相当困难,特别是对于瞬时相互作用。当与结构化伙伴结合时,IDPR在复合物中会采用有序构象。经历这种无序到有序转变的残基称为多变残基,通常在短的连续片段中发现,而理解IDP/IDPR作用方式的第一步将是预测这些残基。有一些可用的方法可以从氨基酸序列预测这些多变片段;然而,文献中报道的它们的性能显然还有改进的空间。在此背景下,当前的研究提出了“Proteus”,这是一种随机森林分类器,可预测残基在与潜在伙伴蛋白结合时经历无序到有序转变的可能性。该预测基于仅使用氨基酸序列即可计算的特征。Proteus与现有方法相比具有优势,在独立数据集上预测的真阳性数量是第二好方法的两倍(55%对27%),并且精度更高。当前的研究还揭示了一种可能的“无序到有序”转变共识,虽然尚未完全理清,但嵌入在IDP的氨基酸序列中。还提出了一些指导方针,用于使用Proteus进行涉及IDPR的实际结构建模。