Department of Environmental Health, University of Cincinnati, Cincinnati, OH 45267, USA.
Curr Protein Pept Sci. 2011 Sep;12(6):563-73. doi: 10.2174/138920311796957603.
On-going efforts to improve protein structure prediction stimulate the development of scoring functions and methods for model quality assessment (MQA) that can be used to rank and select the best protein models for further refinement. In this work, sequence-based prediction of relative solvent accessibility (RSA) is employed as a basis for a simple MQA method for soluble proteins, and subsequently extended to the much less explored case of (alpha-helical) membrane proteins. In analogy to soluble proteins, the level of exposure to the lipid of amino acid residues in transmembrane (TM) domains is captured in terms of the relative lipid accessibility (RLA), which is predicted from sequence using low-complexity Support Vector Regression models. On an independent set of 23 TM proteins, the new SVR-based predictor yields correlation coefficient (CC) of 0.56 between the predicted and observed RLA profiles, as opposed to CC of 0.13 for a baseline predictor that utilizes TMLIP2H empirical lipophilicity scale (with standard deviations of about 0.15). A simple MQA approach is then defined by ranking models of membrane proteins in terms of consistency between predicted and observed RLA profiles, as a measure of similarity to the native structure. The new method does not require a set of decoy models to optimize parameters, circumventing current limitations in this regard. Several different sets of models, including those generated by fragment based folding simulations, and decoys obtained by swapping TM helices to mimic errors in template based assignment, are used to assess the new approach. Predicted RLA profiles can be used to successfully discriminate near native models from non-native decoys in most cases, significantly improving the separation of correct and incorrectly folded models compared to a simple baseline approach that utilizes TMLIP2H. As suggested by the robust performance of a simple MQA method for soluble proteins that utilizes more accurate RSA predictions, further significant improvements are likely to be achieved. The steady growth in the number of resolved membrane protein structures is expected to yield enhanced RLA predictions, facilitating further efforts to improve de novo and template based prediction of membrane protein structure.
目前,为了提高蛋白质结构预测的水平,研究人员正在不断努力,这也刺激了评分函数和模型质量评估(MQA)方法的发展,以便对模型进行排序和选择,从而进一步完善最佳蛋白质模型。在这项工作中,我们将基于序列的相对溶剂可及性(RSA)预测用作可溶性蛋白质简单 MQA 方法的基础,随后将其扩展到研究较少的(α-螺旋)膜蛋白领域。类似于可溶性蛋白质,跨膜(TM)结构域中氨基酸残基暴露于脂质的程度可以用相对脂质可及性(RLA)来表示,这可以使用低复杂度支持向量回归(SVR)模型从序列中预测得到。在一个由 23 个 TM 蛋白组成的独立数据集上,新的基于 SVR 的预测器得到的预测和观察到的 RLA 谱之间的相关系数(CC)为 0.56,而使用 TMLIP2H 经验疏水性尺度的基线预测器的 CC 为 0.13(标准偏差约为 0.15)。然后,通过根据预测和观察到的 RLA 谱之间的一致性对膜蛋白模型进行排序,定义了一种简单的 MQA 方法,作为与天然结构相似性的度量。新方法不需要一组诱饵模型来优化参数,从而避免了当前在这方面的限制。几种不同的模型集,包括基于片段折叠模拟生成的模型集和通过交换 TM 螺旋模拟模板分配错误获得的诱饵模型集,都被用来评估新方法。在大多数情况下,预测的 RLA 谱可以成功地区分近天然模型和非天然诱饵,与利用 TMLIP2H 的简单基线方法相比,显著提高了正确和错误折叠模型的分离度。正如利用更准确的 RSA 预测的可溶性蛋白质的简单 MQA 方法的稳健性能所表明的那样,进一步的显著改进是可能的。随着已解析膜蛋白结构数量的稳步增长,预计 RLA 预测将会得到改善,从而进一步促进从头预测和基于模板的膜蛋白结构预测的发展。