Computer Engineering Department, Bilkent University, Ankara 06800, Turkey.
College of Information and Computer Sciences, University of Massachusetts, Amherst, MA 01003, USA.
Bioinformatics. 2020 Jun 1;36(12):3652-3661. doi: 10.1093/bioinformatics/btaa013.
Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported.
We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas.
The source codes are available at https://github.com/Tastanlab/DeepKinZero.
Supplementary data are available at Bioinformatics online.
蛋白质磷酸化是信号转导途径中蛋白质功能的关键调节剂。激酶是一种以靶特异性方式催化其他蛋白质磷酸化的酶。磷酸化的失调与许多疾病有关,包括癌症。尽管磷酸蛋白质组学的进展能够在蛋白质组水平上鉴定磷酸化位点,但大部分磷酸蛋白质组仍然未知:超过 95%的已报道的人类磷酸化位点没有已知的激酶。确定哪个激酶负责磷酸化一个位点仍然是一个实验挑战。现有的计算方法需要激酶的几个已知靶标的例子来进行准确的激酶特异性预测,但对于大量的激酶来说,只有少数或没有报道的靶位点。
我们提出了 DeepKinZero,这是第一种用于预测无已知磷酸化位点激酶作用于磷酸化位点的零样本学习方法。DeepKinZero 通过零样本学习模型,将具有许多已知靶标磷酸化位点的激酶的知识转移到那些没有已知位点的激酶上。激酶特异性的位置氨基酸偏好使用双向递归神经网络学习。与基线模型和其他可用方法相比,我们表明 DeepKinZero 在准确性方面对无已知磷酸化位点的激酶有显著提高。通过扩展对研究较少的激酶的了解,DeepKinZero 可以帮助绘制磷酸蛋白质组图谱。
源代码可在 https://github.com/Tastanlab/DeepKinZero 上获得。
补充数据可在 Bioinformatics 在线获得。