Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan.
PLoS One. 2012;7(7):e40694. doi: 10.1371/journal.pone.0040694. Epub 2012 Jul 23.
Viruses infect humans and progress inside the body leading to various diseases and complications. The phosphorylation of viral proteins catalyzed by host kinases plays crucial regulatory roles in enhancing replication and inhibition of normal host-cell functions. Due to its biological importance, there is a desire to identify the protein phosphorylation sites on human viruses. However, the use of mass spectrometry-based experiments is proven to be expensive and labor-intensive. Furthermore, previous studies which have identified phosphorylation sites in human viruses do not include the investigation of the responsible kinases. Thus, we are motivated to propose a new method to identify protein phosphorylation sites with its kinase substrate specificity on human viruses. The experimentally verified phosphorylation data were extracted from virPTM--a database containing 301 experimentally verified phosphorylation data on 104 human kinase-phosphorylated virus proteins. In an attempt to investigate kinase substrate specificities in viral protein phosphorylation sites, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. The experimental human phosphorylation sites are collected from Phospho.ELM, grouped according to its kinase annotation, and compared with the virus MDD clusters. This investigation identifies human kinases such as CK2, PKB, CDK, and MAPK as potential kinases for catalyzing virus protein substrates as confirmed by published literature. Profile hidden Markov model is then applied to learn a predictive model for each subgroup. A five-fold cross validation evaluation on the MDD-clustered HMMs yields an average accuracy of 84.93% for Serine, and 78.05% for Threonine. Furthermore, an independent testing data collected from UniProtKB and Phospho.ELM is used to make a comparison of predictive performance on three popular kinase-specific phosphorylation site prediction tools. In the independent testing, the high sensitivity and specificity of the proposed method demonstrate the predictive effectiveness of the identified substrate motifs and the importance of investigating potential kinases for viral protein phosphorylation sites.
病毒感染人体并在体内进化,导致各种疾病和并发症。宿主激酶催化的病毒蛋白磷酸化在增强复制和抑制正常宿主细胞功能方面发挥着关键的调节作用。由于其生物学重要性,人们渴望确定人类病毒的蛋白质磷酸化位点。然而,基于质谱的实验证明既昂贵又费力。此外,以前确定人类病毒中磷酸化位点的研究并不包括对负责激酶的调查。因此,我们有动力提出一种新的方法来确定人类病毒上蛋白质磷酸化位点及其激酶底物特异性。从 virPTM 中提取了实验验证的磷酸化数据——该数据库包含 104 种人类激酶磷酸化病毒蛋白中的 301 个实验验证的磷酸化数据。为了研究病毒蛋白磷酸化位点中的激酶底物特异性,采用最大依赖性分解(MDD)将大量磷酸化数据聚类成包含显著保守基序的子组。从 Phospho.ELM 中收集实验人类磷酸化位点,根据其激酶注释进行分组,并与病毒 MDD 聚类进行比较。这项研究确定了 CK2、PKB、CDK 和 MAPK 等人类激酶作为潜在的激酶,能够催化病毒蛋白底物,这一结论得到了已发表文献的证实。然后应用轮廓隐马尔可夫模型为每个子组学习一个预测模型。在 MDD 聚类的 HMM 上进行五重交叉验证评估,丝氨酸的平均准确率为 84.93%,苏氨酸的准确率为 78.05%。此外,还使用从 UniProtKB 和 Phospho.ELM 收集的独立测试数据,比较了三种流行的激酶特异性磷酸化位点预测工具的预测性能。在独立测试中,所提出方法的高灵敏度和特异性证明了所鉴定的底物基序的预测有效性以及调查病毒蛋白磷酸化位点潜在激酶的重要性。