Zhang Luna, Zou Yang, He Ningning, Chen Yu, Chen Zhen, Li Lei
School of Data Science and Software Engineering, Qingdao University, Qingdao, China.
School of Basic Medicine, Qingdao University, Qingdao, China.
Front Cell Dev Biol. 2020 Sep 9;8:580217. doi: 10.3389/fcell.2020.580217. eCollection 2020.
As a novel type of post-translational modification, lysine 2-Hydroxyisobutyrylation (K ) plays an important role in gene transcription and signal transduction. In order to understand its regulatory mechanism, the essential step is the recognition of K sites. Thousands of K sites have been experimentally verified across five different species. However, there are only a couple traditional machine-learning algorithms developed to predict K sites for limited species, lacking a general prediction algorithm. We constructed a deep-learning algorithm based on convolutional neural network with the one-hot encoding approach, dubbed CNN . It performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve (AUC) values for CNN ranged from 0.82 to 0.87 for different organisms, which is superior to the currently available K predictors. Moreover, we developed the general model based on the integrated data from multiple species and it showed great universality and effectiveness with the AUC values in the range of 0.79-0.87. Accordingly, we constructed the on-line prediction tool dubbed DeepKhib for easily identifying K sites, which includes both species-specific and general models. DeepKhib is available at http://www.bioinfogo.org/DeepKhib.
作为一种新型的翻译后修饰,赖氨酸2-羟基异丁酰化(K )在基因转录和信号转导中发挥着重要作用。为了了解其调控机制,关键步骤是识别K 位点。在五个不同物种中,已有数千个K 位点通过实验验证。然而,仅开发了几种传统的机器学习算法来预测有限物种的K 位点,缺乏通用的预测算法。我们基于卷积神经网络和独热编码方法构建了一种深度学习算法,称为CNN 。在交叉验证和独立测试方面,它在不同物种中比传统机器学习模型和其他深度学习模型表现更优。对于不同生物体,CNN 的ROC曲线下面积(AUC)值在0.82至0.87之间,优于目前可用的K 预测器。此外,我们基于多个物种的整合数据开发了通用模型,其AUC值在0.79 - 0.87范围内,显示出很强的通用性和有效性。因此,我们构建了在线预测工具DeepKhib,用于轻松识别K 位点,它包括物种特异性模型和通用模型。可通过http://www.bioinfogo.org/DeepKhib访问DeepKhib。