Zhang Huiling, Huang Qingsheng, Bei Zhendong, Wei Yanjie, Floudas Christodoulos A
Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
Center for Cloud Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
Proteins. 2016 Mar;84(3):332-48. doi: 10.1002/prot.24979. Epub 2016 Jan 20.
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/.
在本文中,我们介绍了COMSAT,这是一种用于跨膜(TM)蛋白残基接触预测的混合框架,它整合了支持向量机(SVM)方法和混合整数线性规划(MILP)方法。COMSAT由两个模块组成:主要基于位置特异性评分矩阵特征进行训练的COMSAT_SVM,以及基于优化模型的从头算方法COMSAT_MILP。SVM模型预测的接触通过SVM置信度得分进行排序,并训练一个阈值以提高预测接触的可靠性。对于阈值以上没有接触的TM蛋白,则使用COMSAT_MILP。基于Cα-Cα原子之间14 Å的接触定义,在两个独立的TM蛋白集上对所提出的混合接触预测方案进行了测试。首先,在90个TM蛋白的训练集上使用严格的留一蛋白交叉验证,对于至少相隔六个氨基酸的残基对,获得了66.8%的准确率、12.3%的覆盖率、99.3%的特异性和0.184的马修斯相关系数(MCC)。其次,在87个TM蛋白的测试集上进行测试时,所提出的方法显示出64.5%的预测准确率、5.3%的覆盖率、99.4%的特异性和0.106的MCC。与其他12种最先进的预测器相比,COMSAT显示出令人满意的结果,并且随着TM蛋白的长度和复杂性增加,在预测准确性方面更具稳健性。可通过http://hpcc.siat.ac.cn/COMSAT/免费访问COMSAT。