Nilkanth Vipul V, Mande Shekhar C
National Centre for Cell Science, S.P. Pune University Campus, Pune, India.
Council of Scientific and Industrial Research, New Delhi, India.
Proteins. 2022 Jan;90(1):131-141. doi: 10.1002/prot.26195. Epub 2021 Aug 26.
Elucidation of signaling events in a pathogen is potentially important to tackle the infection caused by it. Such events mediated by protein phosphorylation play important roles in infection, and therefore, to predict the phosphosites and substrates of the serine/threonine protein kinases, we have developed a Machine learning-based approach for Mycobacterium tuberculosis serine/threonine protein kinases using kinase-peptide structure-sequence data. This approach utilizes features derived from kinase three-dimensional-structure environment and known phosphosite sequences to generate support vector machine (SVM)-based kinase-specific predictions of phosphosites of serine/threonine protein kinases (STPKs) with no or scarce data of their substrates. SVM outperformed the four machine learning algorithms we tried (random forest, logistic regression, SVM, and k-nearest neighbors) with an area under the curve receiver-operating characteristic value of 0.88 on the independent testing dataset and a 10-fold cross-validation accuracy of ~81.6% for the final model. Our predicted phosphosites of M. tuberculosis STPKs form a useful resource for experimental biologists enabling elucidation of STPK mediated posttranslational regulation of important cellular processes.
阐明病原体中的信号事件对于应对由其引起的感染可能具有重要意义。由蛋白质磷酸化介导的此类事件在感染中发挥着重要作用,因此,为了预测丝氨酸/苏氨酸蛋白激酶的磷酸化位点和底物,我们利用激酶-肽结构-序列数据,开发了一种基于机器学习的针对结核分枝杆菌丝氨酸/苏氨酸蛋白激酶的方法。该方法利用源自激酶三维结构环境和已知磷酸化位点序列的特征,对丝氨酸/苏氨酸蛋白激酶(STPKs)的磷酸化位点生成基于支持向量机(SVM)的激酶特异性预测,而这些激酶的底物数据很少或没有。在独立测试数据集上,支持向量机的表现优于我们尝试的四种机器学习算法(随机森林、逻辑回归、支持向量机和k近邻),曲线下面积接收操作特征值为0.88,最终模型的10倍交叉验证准确率约为81.6%。我们预测的结核分枝杆菌STPKs的磷酸化位点为实验生物学家提供了有用的资源,有助于阐明STPK介导的重要细胞过程的翻译后调控。