Fang Chun, Moriwaki Yoshitaka, Tian Aikui, Li Caihong, Shimizu Kentaro
* Department of Computer Science and Engineering, Shandong University of Technology, Shandong 255049, P. R. China.
† Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.
J Bioinform Comput Biol. 2019 Feb;17(1):1950004. doi: 10.1142/S0219720019500045.
Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .
分子识别特征(MoRFs)是内在无序蛋白质(IDPs)的关键功能区域,其在细胞的分子相互作用网络中发挥重要作用,并与许多严重的人类疾病相关。识别MoRFs对于IDPs的功能研究和药物设计都至关重要。本研究采用前沿的人工智能机器学习方法来开发一个强大的模型以改进MoRFs预测。我们提出了一种名为en_DCNNMoRF(基于集成深度卷积神经网络的MoRF预测器)的方法。它结合了两个利用不同特征的独立深度卷积神经网络(DCNN)分类器的结果。第一个,DCNNMoRF1,采用位置特异性评分矩阵(PSSM)和22种氨基酸相关因子来描述蛋白质序列。第二个,DCNNMoRF2,采用PSSM和13种氨基酸指数来描述蛋白质序列。对于这两个单分类器,均采用了具有新型二维注意力机制的DCNN,并添加了平均策略以进一步处理每个DCNN模型的输出概率。最后,en_DCNNMoRF通过对两个模型的最终得分求平均来将它们结合起来。当与应用于相同数据集的其他知名工具进行比较时,新提出方法的准确率与最先进的方法相当。相关的网络服务器可通过http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php免费访问。