Wang Xiaofeng, Yan Renxiang, Chen Yong-Zi, Wang Yongji
College of Mathematics and Computer Sciences, Shanxi Normal University, Linfen, 041004, China.
School of Biological Sciences and Engineering, Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou University, Fuzhou, 350002, China.
Plant Mol Biol. 2021 Apr;105(6):601-610. doi: 10.1007/s11103-020-01112-w. Epub 2021 Feb 1.
We developed two CNNs for predicting ubiquitination sites in Arabidopsis thaliana, demonstrated their competitive performance, analyzed amino acid physicochemical properties and the CNN structures, and predicted ubiquitination sites in Arabidopsis. As an important posttranslational protein modification, ubiquitination plays critical roles in plant physiology, including plant growth and development, biotic and abiotic stress, metabolism, and so on. A lot of ubiquitination site prediction models have been developed for human, mouse and yeast. However, there are few models to predict ubiquitination sites for the plant Arabidopsis thaliana. Based on this context, we proposed two convolutional neural network (CNN) based models for predicting ubiquitination sites in A. thaliana. The two models reach AUC (area under the ROC curve) values of 0.924 and 0.913 respectively in five-fold cross-validation, and 0.921 and 0.914 respectively in independent test, which outperform other models and demonstrate the competitive edge of them. We in-depth analyze the amino acid physicochemical properties in the neighboring sequence regions of the ubiquitination sites, and study the influence of the CNN structure to the prediction performance. Potential ubiquitination sites in the global Arbidopsis proteome are predicted using the two CNN models. To facilitate the community, the source code, training and test dataset, predicted ubiquitination sites in the Arbidopsis proteome are available at GitHub ( http://github.com/nongdaxiaofeng/CNNAthUbi ) for interest users.
我们开发了两个用于预测拟南芥中泛素化位点的卷积神经网络(CNN),展示了它们的竞争性能,分析了氨基酸理化性质和CNN结构,并预测了拟南芥中的泛素化位点。作为一种重要的蛋白质翻译后修饰,泛素化在植物生理学中发挥着关键作用,包括植物生长发育、生物和非生物胁迫、新陈代谢等。已经为人类、小鼠和酵母开发了许多泛素化位点预测模型。然而,用于预测植物拟南芥泛素化位点的模型却很少。基于此背景,我们提出了两个基于卷积神经网络(CNN)的模型来预测拟南芥中的泛素化位点。在五折交叉验证中,这两个模型的AUC(ROC曲线下面积)值分别达到0.924和0.913,在独立测试中分别为0.921和0.914,优于其他模型并展示了它们的竞争优势。我们深入分析了泛素化位点相邻序列区域的氨基酸理化性质,并研究了CNN结构对预测性能的影响。使用这两个CNN模型预测了拟南芥全蛋白质组中的潜在泛素化位点。为方便同行使用,相关源代码、训练和测试数据集以及拟南芥蛋白质组中预测的泛素化位点可在GitHub(http://github.com/nongdaxiaofeng/CNNAthUbi)上供感兴趣的用户使用。