ILYCROsite：基于FCM-GRNN欠采样技术的赖氨酸巴豆酰化位点鉴定

Zuo Yun, Wan Minquan, Shen Yang, Wang Xinheng, He Wenying, Bi Yue, Liu Xiangrong, Deng Zhaohong

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.

Comput Biol Chem. 2024 Dec;113:108212. doi: 10.1016/j.compbiolchem.2024.108212. Epub 2024 Sep 13.

Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite.

蛋白质赖氨酸巴豆酰化是一种重要的翻译后修饰，可调节各种细胞活动。例如，组蛋白巴豆酰化会影响染色质结构并促进组蛋白置换。赖氨酸巴豆酰化位点的识别和理解在蛋白质研究领域至关重要。然而，由于非组蛋白巴豆酰化位点数量不断增加，基于传统机器学习的现有分类器可能会遇到性能限制。鉴于深度学习技术在序列数据分析方面的独特优势，本研究提出了一种基于深度学习的新型巴豆酰化位点识别模型。在本研究中，开发了一种基于MLP-注意力的模型来识别巴豆酰化位点。首先，使用三种特征提取策略，即氨基酸组成、K-mer和基于距离的残基特征提取策略，对巴豆酰化和非巴豆酰化序列进行编码。然后，为了平衡训练数据集，引入了结合模糊聚类和广义神经网络方法的FCM-GRNN欠采样算法。最后，为了提高巴豆酰化位点识别的有效性，我们探索了各种分类算法，并基于相关实验性能比较，最终选择了结合叠加自注意力机制的多层感知器（MLP）来构建预测模型ILYCROsite。独立测试和五折交叉验证的结果表明，本研究提出的模型ILYCROsite具有优异的性能。值得注意的是，在独立测试集上，ILYCROsite的AUC值达到87.93%，明显优于现有的最先进模型。此外，使用SHAP（Shapley Additive exPlanations）值来分析特征的重要性及其对模型预测的影响。同时，为了方便研究人员使用本研究构建的预测模型，我们开发了一个预测程序来识别给定蛋白质序列中的巴豆酰化位点。该程序的数据和代码可在以下网址获取：https://github.com/wmqskr/ILYCROsite。

相似文献

ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique.

Comput Biol Chem. 2024 Dec;113:108212. doi: 10.1016/j.compbiolchem.2024.108212. Epub 2024 Sep 13.

MVNN-HNHC:A multi-view neural network for identification of human non-histone crotonylation sites.

Anal Biochem. 2024 Apr;687:115426. doi: 10.1016/j.ab.2023.115426. Epub 2023 Dec 22.

Adapt-Kcr: a novel deep learning framework for accurate prediction of lysine crotonylation sites based on learning embedding features and attention architecture.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac037.

Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net.

Anal Biochem. 2020 Nov 15;609:113903. doi: 10.1016/j.ab.2020.113903. Epub 2020 Aug 15.

Using ATCLSTM-Kcr to predict and generate the human lysine crotonylation database.

J Proteomics. 2023 Jun 15;281:104905. doi: 10.1016/j.jprot.2023.104905. Epub 2023 Apr 12.

Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.

J Chem Inf Model. 2024 Aug 26;64(16):6699-6711. doi: 10.1021/acs.jcim.4c01034. Epub 2024 Aug 9.

MlyPredCSED: based on extreme point deviation compensated clustering combined with cross-scale convolutional neural networks to predict multiple lysine sites in human.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf189.

Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks.

J Proteome Res. 2022 Jan 7;21(1):265-273. doi: 10.1021/acs.jproteome.1c00848. Epub 2021 Nov 23.

nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab146.

PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites.

PLoS Comput Biol. 2024 Oct 22;20(10):e1012544. doi: 10.1371/journal.pcbi.1012544. eCollection 2024 Oct.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique.

Comput Biol Chem. 2024 Dec;113:108212. doi: 10.1016/j.compbiolchem.2024.108212. Epub 2024 Sep 13.

MVNN-HNHC:A multi-view neural network for identification of human non-histone crotonylation sites.

Anal Biochem. 2024 Apr;687:115426. doi: 10.1016/j.ab.2023.115426. Epub 2023 Dec 22.

Adapt-Kcr: a novel deep learning framework for accurate prediction of lysine crotonylation sites based on learning embedding features and attention architecture.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac037.

Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net.

Anal Biochem. 2020 Nov 15;609:113903. doi: 10.1016/j.ab.2020.113903. Epub 2020 Aug 15.

Using ATCLSTM-Kcr to predict and generate the human lysine crotonylation database.

J Proteomics. 2023 Jun 15;281:104905. doi: 10.1016/j.jprot.2023.104905. Epub 2023 Apr 12.

Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.

J Chem Inf Model. 2024 Aug 26;64(16):6699-6711. doi: 10.1021/acs.jcim.4c01034. Epub 2024 Aug 9.

MlyPredCSED: based on extreme point deviation compensated clustering combined with cross-scale convolutional neural networks to predict multiple lysine sites in human.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf189.

Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks.

J Proteome Res. 2022 Jan 7;21(1):265-273. doi: 10.1021/acs.jproteome.1c00848. Epub 2021 Nov 23.

nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab146.

PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites.

PLoS Comput Biol. 2024 Oct 22;20(10):e1012544. doi: 10.1371/journal.pcbi.1012544. eCollection 2024 Oct.

ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献