Mishra Nitish K, Chang Junil, Zhao Patrick X
Plant Biology Division, The Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America.
PLoS One. 2014 Jun 26;9(6):e100278. doi: 10.1371/journal.pone.0100278. eCollection 2014.
Membrane transport proteins (transporters) move hydrophilic substrates across hydrophobic membranes and play vital roles in most cellular functions. Transporters represent a diverse group of proteins that differ in topology, energy coupling mechanism, and substrate specificity as well as sequence similarity. Among the functional annotations of transporters, information about their transporting substrates is especially important. The experimental identification and characterization of transporters is currently costly and time-consuming. The development of robust bioinformatics-based methods for the prediction of membrane transport proteins and their substrate specificities is therefore an important and urgent task.
Support vector machine (SVM)-based computational models, which comprehensively utilize integrative protein sequence features such as amino acid composition, dipeptide composition, physico-chemical composition, biochemical composition, and position-specific scoring matrices (PSSM), were developed to predict the substrate specificity of seven transporter classes: amino acid, anion, cation, electron, protein/mRNA, sugar, and other transporters. An additional model to differentiate transporters from non-transporters was also developed. Among the developed models, the biochemical composition and PSSM hybrid model outperformed other models and achieved an overall average prediction accuracy of 76.69% with a Mathews correlation coefficient (MCC) of 0.49 and a receiver operating characteristic area under the curve (AUC) of 0.833 on our main dataset. This model also achieved an overall average prediction accuracy of 78.88% and MCC of 0.41 on an independent dataset.
Our analyses suggest that evolutionary information (i.e., the PSSM) and the AAIndex are key features for the substrate specificity prediction of transport proteins. In comparison, similarity-based methods such as BLAST, PSI-BLAST, and hidden Markov models do not provide accurate predictions for the substrate specificity of membrane transport proteins. TrSSP: The Transporter Substrate Specificity Prediction Server, a web server that implements the SVM models developed in this paper, is freely available at http://bioinfo.noble.org/TrSSP.
膜转运蛋白(转运体)可将亲水性底物转运穿过疏水性膜,在大多数细胞功能中发挥着至关重要的作用。转运体是一类多样化的蛋白质,它们在拓扑结构、能量偶联机制、底物特异性以及序列相似性方面存在差异。在转运体的功能注释中,关于其转运底物的信息尤为重要。目前,通过实验鉴定和表征转运体既昂贵又耗时。因此,开发基于强大生物信息学的方法来预测膜转运蛋白及其底物特异性是一项重要且紧迫的任务。
开发了基于支持向量机(SVM)的计算模型,该模型综合利用了诸如氨基酸组成、二肽组成、物理化学组成、生化组成和位置特异性评分矩阵(PSSM)等综合蛋白质序列特征,以预测七类转运体的底物特异性:氨基酸转运体、阴离子转运体、阳离子转运体、电子转运体、蛋白质/信使核糖核酸转运体、糖转运体和其他转运体。还开发了一个用于区分转运体和非转运体的附加模型。在所开发的模型中,生化组成和PSSM混合模型优于其他模型,在我们的主要数据集上总体平均预测准确率达到76.69%,马修斯相关系数(MCC)为0.49,曲线下面积(AUC)为0.833。该模型在独立数据集上总体平均预测准确率也达到了78.88%,MCC为0.41。
我们的分析表明,进化信息(即PSSM)和氨基酸指数是转运蛋白底物特异性预测的关键特征。相比之下,基于相似性的方法,如BLAST、PSI-BLAST和隐马尔可夫模型,不能为膜转运蛋白的底物特异性提供准确预测。TrSSP:转运体底物特异性预测服务器,一个实现本文所开发的SVM模型的网络服务器,可在http://bioinfo.noble.org/TrSSP免费获取。