Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
Bioinformatics. 2017 Dec 15;33(24):3909-3916. doi: 10.1093/bioinformatics/btx496.
Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction.
We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data.
MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep.
Supplementary data are available at Bioinformatics online.
磷酸化位点预测的计算方法在蛋白质功能研究和实验设计中起着重要作用。大多数现有方法都是基于特征提取,这可能导致特征不完整或有偏差。深度学习作为前沿的机器学习方法,具有从原始序列中自动发现磷酸化模式复杂表示的能力,因此为改进磷酸化位点预测提供了强大的工具。
我们提出了 MusiteDeep,这是第一个用于预测一般和激酶特异性磷酸化位点的深度学习框架。MusiteDeep 以原始序列数据为输入,使用具有新颖二维注意力机制的卷积神经网络。与基准数据上其他知名工具相比,它在一般磷酸化位点预测中的准确率-召回率曲线下面积的相对提高超过 50%,在激酶特异性预测方面也取得了有竞争力的结果。
MusiteDeep 作为一个开源工具提供,可在 https://github.com/duolinwang/MusiteDeep 上获得。
补充数据可在 Bioinformatics 在线获得。