Al-Barakati Hussam, Thapa Niraj, Hiroto Saigo, Roy Kaushik, Newman Robert H, Kc Dukka
Department of computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan.
Comput Struct Biotechnol J. 2020 Mar 4;18:852-860. doi: 10.1016/j.csbj.2020.02.012. eCollection 2020.
Malonylation, which has recently emerged as an important lysine modification, regulates diverse biological activities and has been implicated in several pervasive disorders, including cardiovascular disease and cancer. However, conventional global proteomics analysis using tandem mass spectrometry can be time-consuming, expensive and technically challenging. Therefore, to complement and extend existing experimental methods for malonylation site identification, we developed two novel computational methods for malonylation site prediction based on random forest and deep learning machine learning algorithms, RF-MaloSite and DL-MaloSite, respectively. DL-MaloSite requires the primary amino acid sequence as an input and RF-MaloSite utilizes a diverse set of biochemical, physiochemical and sequence-based features. While systematic assessment of performance metrics suggests that both 'RF-MaloSite' and 'DL-MaloSite' perform well in all metrics tested, our methods perform particularly well in the areas of accuracy, sensitivity and overall method performance (assessed by the Matthew's Correlation Coefficient). For instance, RF-MaloSite exhibited MCC scores of 0.42 and 0.40 using 10-fold cross-validation and an independent test set, respectively. Meanwhile, DL-MaloSite was characterized by MCC scores of 0.51 and 0.49 based on 10-fold cross-validation and an independent set, respectively. Importantly, both methods exhibited efficiency scores that were on par or better than those achieved by existing malonylation site prediction methods. The identification of these sites may also provide important insights into the mechanisms of crosstalk between malonylation and other lysine modifications, such as acetylation, glutarylation and succinylation. To facilitate their use, both methods have been made freely available to the research community at https://github.com/dukkakc/DL-MaloSite-and-RF-MaloSite.
丙二酰化是一种最近才出现的重要赖氨酸修饰,它调节多种生物活性,并与包括心血管疾病和癌症在内的多种普遍疾病有关。然而,使用串联质谱的传统全局蛋白质组学分析可能耗时、昂贵且技术上具有挑战性。因此,为了补充和扩展现有的丙二酰化位点鉴定实验方法,我们分别基于随机森林和深度学习机器学习算法开发了两种新的丙二酰化位点预测计算方法,即RF-MaloSite和DL-MaloSite。DL-MaloSite需要输入一级氨基酸序列,而RF-MaloSite利用多种生化、物理化学和基于序列的特征。虽然对性能指标的系统评估表明“RF-MaloSite”和“DL-MaloSite”在所有测试指标中表现良好,但我们的方法在准确性、敏感性和整体方法性能(通过马修斯相关系数评估)方面表现尤其出色。例如,RF-MaloSite在使用10折交叉验证和独立测试集时,MCC分数分别为0.42和0.40。同时,DL-MaloSite基于10折交叉验证和独立集的MCC分数分别为0.51和0.49。重要的是,这两种方法的效率分数与现有丙二酰化位点预测方法相当或更好。这些位点的鉴定还可能为丙二酰化与其他赖氨酸修饰(如乙酰化、戊二酰化和琥珀酰化)之间的串扰机制提供重要见解。为了便于使用,这两种方法已在https://github.com/dukkakc/DL-MaloSite-and-RF-MaloSite上免费提供给研究社区。