Wei Y, Floudas C A
Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A.
Chem Eng Sci. 2011 Oct 1;66(19):4356-4369. doi: 10.1016/j.ces.2011.04.033.
In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set [1], we have enhanced this method by 1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, 2) enhancing the mathematical model via modifications of several important physical constraints and 3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. [2]. The blind contact prediction scheme has been tested on two different membrane protein sets. Firstly it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Secondly, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit [3]) and it is shown that it exhibits better prediction accuracy.
在本文中,基于麦卡利斯特和弗洛达斯最近的一项工作,他们开发了一个数学优化模型,用于从有限的蛋白质数据集中预测跨膜α-螺旋蛋白中的接触点[1],我们对该方法进行了改进:1)为跨膜α-螺旋蛋白构建了一个更全面的数据集,然后使用这个增强后的数据集来构建用于残基接触预测的概率集MIN-1N和MIN-2N;2)通过修改几个重要的物理约束条件来增强数学模型;3)在分析富克斯等人[2]对65种蛋白质的接触预测后,针对不同的蛋白质集应用一种新的盲接触预测方案。该盲接触预测方案已在两个不同的膜蛋白集上进行了测试。首先,将其应用于从训练集中精心挑选的五种蛋白质。这五种蛋白质的接触预测使用通过从训练集中排除目标蛋白构建的概率集,获得的平均准确率为56%。其次,将其应用于六种具有复杂拓扑结构的独立膜蛋白,对于2ZY9A的预测准确率为73%,对于3KCUA为21%,对于2W1PA为46%,对于3CN5A为64%,对于3IXZA为77%,对于3K3FA为83%。这六种蛋白质的平均预测准确率为60.7%。还将所提出的方法与支持向量机方法(TMhit [3])进行了比较,结果表明它具有更好的预测准确率。