Suppr超能文献

预测溶解度诱变的评分函数。

Scoring function to predict solubility mutagenesis.

作者信息

Tian Ye, Deutsch Christopher, Krishnamoorthy Bala

机构信息

Department of Mathematics, Washington State University, Pullman, WA 99164, USA.

出版信息

Algorithms Mol Biol. 2010 Oct 7;5:33. doi: 10.1186/1748-7188-5-33.

Abstract

BACKGROUND

Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention.

RESULTS

We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%.

AVAILABILITY

Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.

摘要

背景

诱变常用于改造具有野生型(WT)蛋白所不具备的理想特性的蛋白质,如增加或降低稳定性、反应活性或溶解性。实验人员常常需要从大量候选突变中选择一小部分来获得期望的变化,而计算技术对于做出这些选择非常重要。虽然已经提出了几种方法来预测稳定性和反应活性诱变,但溶解性方面并未受到太多关注。

结果

我们运用计算几何的概念来定义一个三体评分函数,该函数可预测由于突变导致的蛋白质溶解性变化。该评分函数同时捕捉了序列和结构信息。通过查阅文献,我们收集了一个包含137个单点和多点溶解性突变的大型数据库。我们的数据库是目前已知的包含结构信息的最大此类集合。我们使用线性规划(LP)方法优化评分函数,以便根据训练得出其权重。从默认值1开始,我们找到了范围在[0,2]内的权重,从而优化了溶解性增加或降低的预测。我们将LP方法与支持向量机(SVM)和套索回归等标准机器学习技术进行了比较。通过使用留一法(LOO)、10折交叉验证和3折交叉验证(CV)的统计数据进行训练和预测,我们证明LP方法总体表现最佳。对于留一法交叉验证,LP方法的总体准确率为81%。

可用性

程序可执行文件、权重表和突变体数据集可从以下网页获取:http://www.wsu.edu/~kbala/OptSolMut.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1d0/2958853/0b6a832d877b/1748-7188-5-33-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验