Tng Sian Soo, Le Nguyen Quoc Khanh, Yeh Hui-Yuan, Chua Matthew Chin Heng
Institute of Systems Science, National University of Singapore, 29 Heng Mui Keng Terrace, Singapore 119620, Singapore.
Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan.
J Proteome Res. 2022 Jan 7;21(1):265-273. doi: 10.1021/acs.jproteome.1c00848. Epub 2021 Nov 23.
Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.
组蛋白赖氨酸巴豆酰化(Kcr)是组蛋白的一种翻译后修饰,参与基因转录、急性和慢性肾损伤、精子发生、抑郁症、癌症等的调控。蛋白质中Kcr位点的鉴定对于表征和调控主要生物学机制很重要。近年来,由于传统的湿实验室实验既耗时又昂贵,机器学习和深度学习算法等计算方法应运而生。作为本研究的一部分,我们提出了一种基于循环神经网络(RNN)的深度学习模型,称为Sohoko-Kcr,用于预测Kcr位点。通过肽序列的嵌入编码,我们使用交叉验证和独立测试研究了基于RNN的模型,如长短期记忆(LSTM)、双向LSTM(BiLSTM)和双向门控循环单元(BiGRU)网络的效率。我们还建立了Sohoko-Kcr与其他已发表工具之间的比较,以使用独立集测试基于3折、5折和10折交叉验证来验证我们模型的效率。结果表明,BiGRU模型始终表现出出色的性能和计算效率。基于所提出的模型,部署了一个名为Sohoko-Kcr的网络服务器供免费使用,可在https://sohoko-research-9uu23.ondigitalocean.app上访问。