School of Information Engineering, Huangshan University, Huangshan, 245041, China.
Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China.
Interdiscip Sci. 2021 Dec;13(4):693-702. doi: 10.1007/s12539-021-00448-1. Epub 2021 Jun 18.
Transmembrane proteins play a vital role in cell life activities. There are several techniques to determine transmembrane protein structures and X-ray crystallography is the primary methodology. However, due to the special properties of transmembrane proteins, it is still hard to determine their structures by X-ray crystallography technique. To reduce experimental consumption and improve experimental efficiency, it is of great significance to develop computational methods for predicting the crystallization propensity of transmembrane proteins. In this work, we proposed a sequence-based machine learning method, namely Prediction of TransMembrane protein Crystallization propensity (PTMC), to predict the propensity of transmembrane protein crystallization. First, we obtained several general sequence features and the specific encoded features of relative solvent accessibility and hydrophobicity. Second, feature selection was employed to filter out redundant and irrelevant features, and the optimal feature subset is composed of hydrophobicity, amino acid composition and relative solvent accessibility. Finally, we chose extreme gradient boosting by comparing with other several machine learning methods. Comparative results on the independent test set indicate that PTMC outperforms state-of-the-art sequence-based methods in terms of sensitivity, specificity, accuracy, Matthew's Correlation Coefficient (MCC) and Area Under the receiver operating characteristic Curve (AUC). In comparison with two competitors, Bcrystal and TMCrys, PTMC achieves an improvement by 0.132 and 0.179 for sensitivity, 0.014 and 0.127 for specificity, 0.037 and 0.192 for accuracy, 0.128 and 0.362 for MCC, and 0.027 and 0.125 for AUC, respectively.
跨膜蛋白在细胞生命活动中起着至关重要的作用。有几种技术可以确定跨膜蛋白的结构,而 X 射线晶体学是主要方法。然而,由于跨膜蛋白的特殊性质,用 X 射线晶体学技术确定它们的结构仍然很困难。为了减少实验消耗,提高实验效率,开发用于预测跨膜蛋白结晶倾向的计算方法具有重要意义。在这项工作中,我们提出了一种基于序列的机器学习方法,即跨膜蛋白结晶倾向预测(PTMC),用于预测跨膜蛋白结晶的倾向。首先,我们获得了几个一般的序列特征和相对溶剂可及性和疏水性的特定编码特征。其次,采用特征选择来过滤冗余和不相关的特征,最优特征子集由疏水性、氨基酸组成和相对溶剂可及性组成。最后,我们通过与其他几种机器学习方法进行比较,选择了极端梯度增强。在独立测试集上的比较结果表明,PTMC 在灵敏度、特异性、准确性、马修相关系数(MCC)和接收者操作特征曲线下的面积(AUC)方面优于基于序列的最新方法。与两个竞争对手 Bcrystal 和 TMCrys 相比,PTMC 在灵敏度方面分别提高了 0.132 和 0.179,特异性提高了 0.014 和 0.127,准确性提高了 0.037 和 0.192,MCC 提高了 0.128 和 0.362,AUC 提高了 0.027 和 0.125。