Department of Electrical and Computer Engineering, University of Alberta, Edmonton, T6G 2V4, Canada.
Bioinformatics. 2012 Jun 15;28(12):i75-83. doi: 10.1093/bioinformatics/bts209.
Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains.
We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues.
http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf.
分子识别特征 (MoRFs) 是位于较长的固有无序区域内的短结合区域,通过无序到有序的转变与蛋白质伴侣结合。MoRFs 参与包括信号转导和调节在内的重要过程。然而,已知的实验验证的 MoRF 数量有限,这促使开发了从蛋白质链预测 MoRF 的计算方法。
我们引入了一种新的 MoRF 预测器 MoRFpred,它可以识别所有 MoRF 类型(α、β、卷曲和复杂)。我们开发了一个包含注释 MoRF 的综合数据集来构建和经验比较我们的方法。MoRFpred 采用了一种新颖的设计,其中通过序列比对生成的注释与通过支持向量机 (SVM) 生成的预测融合在一起,SVM 使用一组自定义设计的序列衍生特征。这些特征提供了有关进化特征、选定的氨基酸理化特性以及预测的无序、溶剂可及性和 B 因子的信息。在几个数据集上的经验评估表明,MoRFpred 优于相关方法:预测 α-MoRF 的 α-MoRF-Pred 和当与球状伴侣结合时变得有序的发现无序区域的 ANCHOR。我们表明,我们预测的(新的)MoRF 区域与天然 MoRF 具有非随机的序列相似性。我们利用这一观察结果以及预测概率更高的预测更准确的事实来识别可能的 MoRF 区域。我们还确定了一些 MoRF 的序列衍生特征。与相邻(在链中)残基相比,它们的特征是无序预测中的下降以及更高的疏水性和稳定性。
http://biomine.ece.ualberta.ca/MoRFpred/;http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf。