Fang Chun, Moriwaki Yoshitaka, Li Caihong, Shimizu Kentaro
Department of Computer Science and Engineering, Shandong University of Technology, Shandong 255049, P. R. China.
Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan.
J Bioinform Comput Biol. 2019 Dec;17(6):1940015. doi: 10.1142/S0219720019400158.
Molecular recognition features (MoRFs) usually act as "hub" sites in the interaction networks of intrinsically disordered proteins (IDPs). Because an increasing number of serious diseases have been found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we propose an ensemble learning strategy, named MoRFPred_en, to predict MoRFs from protein sequences. This approach combines four submodels that utilize different sequence-derived features for the prediction, including a multichannel one-dimensional convolutional neural network (CNN_1D multichannel) based model, two deep two-dimensional convolutional neural network (DCNN_2D) based models, and a support vector machine (SVM) based model. When compared with other methods on the same datasets, the MoRFPred_en approach produced better results than existing state-of-the-art MoRF prediction methods, achieving an AUC of 0.762 on the VALIDATION419 dataset, 0.795 on the TEST45 dataset, and 0.776 on the TEST49 dataset. Availability: http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/MoRFPred_en.php.
分子识别特征(MoRFs)通常在内在无序蛋白(IDPs)的相互作用网络中充当“枢纽”位点。由于已发现越来越多的严重疾病与无序蛋白相关,识别MoRFs变得越来越重要。在本研究中,我们提出了一种名为MoRFPred_en的集成学习策略,用于从蛋白质序列中预测MoRFs。该方法结合了四个利用不同序列衍生特征进行预测的子模型,包括基于多通道一维卷积神经网络(CNN_1D多通道)的模型、两个基于深度二维卷积神经网络(DCNN_2D)的模型以及一个基于支持向量机(SVM)的模型。在相同数据集上与其他方法进行比较时,MoRFPred_en方法比现有的最先进的MoRF预测方法产生了更好的结果,在VALIDATION419数据集上的AUC为0.762,在TEST45数据集上为0.795,在TEST49数据集上为0.776。可用性:http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/MoRFPred_en.php。