基于遗传算法和 Bagging-SVM 集成分类器的潜在可成药蛋白的准确预测。

Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier.

机构信息

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China.

出版信息

Artif Intell Med. 2019 Jul;98:35-47. doi: 10.1016/j.artmed.2019.07.005. Epub 2019 Jul 19.

DOI:10.1016/j.artmed.2019.07.005

PMID:31521251

Abstract

Discovering and accurately locating drug targets is of great significance for the research and development of new drugs. As a different approach to traditional drug development, the machine learning algorithm is used to predict the drug target by mining the data. Because of its advantages of short time and low cost, it has received more and more attention in recent years. In this paper, we propose a novel method for predicting druggable proteins. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC), dipeptide composition (DPC) and reduced sequence (RS), getting the 591 dimension of drug target dataset. Then, the feature information of druggable proteins dataset is selected by genetic algorithm (GA). Finally, we use Bagging ensemble learning to improve SVM classifier to get the final prediction model. The predictive accuracy rate reaches 93.78% by using 5-fold cross-validation and compared with other state-of-the-art predictive methods. The results indicate that the method proposed in this paper has a high reference value for the prediction of potential drug targets, which will successfully play a key role in the drug research and development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/GA-Bagging-SVM.

摘要

发现和准确定位药物靶点对于新药的研究和开发具有重要意义。作为一种有别于传统药物开发的方法，机器学习算法通过挖掘数据来预测药物靶点。由于其时间短、成本低的优势，近年来受到了越来越多的关注。本文提出了一种新的可药理性蛋白质预测方法。首先，通过结合 Chou 的伪氨基酸组成（PseAAC）、二肽组成（DPC）和简化序列（RS）来提取蛋白质序列的特征，得到 591 维的药物靶点数据集。然后，通过遗传算法（GA）选择可药理性蛋白质数据集的特征信息。最后，我们使用 Bagging 集成学习来改进 SVM 分类器，得到最终的预测模型。通过 5 折交叉验证，预测准确率达到 93.78%，与其他最先进的预测方法相比有所提高。结果表明，本文提出的方法对潜在药物靶点的预测具有较高的参考价值，将在药物研究和开发中成功发挥关键作用。源代码和所有数据集均可在 https://github.com/QUST-AIBBDRC/GA-Bagging-SVM 上获得。

相似文献

Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier.基于遗传算法和 Bagging-SVM 集成分类器的潜在可成药蛋白的准确预测。

Artif Intell Med. 2019 Jul;98:35-47. doi: 10.1016/j.artmed.2019.07.005. Epub 2019 Jul 19.

Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition.通过将伪位置特异性评分矩阵纳入广义 Chou 的伪氨基酸组成来预测蛋白质亚线粒体位置。

J Theor Biol. 2018 Aug 7;450:86-103. doi: 10.1016/j.jtbi.2018.04.026. Epub 2018 Apr 18.

Predicting protein-protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach.通过融合各种周伪氨基酸组成成分并使用小波去噪方法来预测蛋白质-蛋白质相互作用。

J Theor Biol. 2019 Feb 7;462:329-346. doi: 10.1016/j.jtbi.2018.11.011. Epub 2018 Nov 16.

DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.DeepStack-DTIs：使用 LightGBM 特征选择和深度堆叠集成分类器预测药物-靶标相互作用。

Interdiscip Sci. 2022 Jun;14(2):311-330. doi: 10.1007/s12539-021-00488-7. Epub 2021 Nov 3.

Data-driven diagnosis of spinal abnormalities using feature selection and machine learning algorithms.基于特征选择和机器学习算法的脊柱异常数据驱动诊断。

PLoS One. 2020 Feb 6;15(2):e0228422. doi: 10.1371/journal.pone.0228422. eCollection 2020.

DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC：一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。

J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.

Detecting Succinylation sites from protein sequences using ensemble support vector machine.基于集成支持向量机从蛋白质序列中检测琥珀酰化位点。

BMC Bioinformatics. 2018 Jun 25;19(1):237. doi: 10.1186/s12859-018-2249-4.

Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising.基于小波去噪结合周氏伪氨基酸组成和伪位置特异性得分矩阵对凋亡蛋白亚细胞定位的准确预测

Oncotarget. 2017 Nov 21;8(64):107640-107665. doi: 10.18632/oncotarget.22585. eCollection 2017 Dec 8.

Prediction of protein structural class for low-similarity sequences using Chou's pseudo amino acid composition and wavelet denoising.基于周氏伪氨基酸组成和小波去噪的低相似性序列蛋白质结构类预测

J Mol Graph Model. 2017 Sep;76:260-273. doi: 10.1016/j.jmgm.2017.07.012. Epub 2017 Jul 14.

Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure.基于进化信息和化学结构的 Lasso 与随机森林预测药物-靶标相互作用。

Genomics. 2019 Dec;111(6):1839-1852. doi: 10.1016/j.ygeno.2018.12.007. Epub 2018 Dec 11.

引用本文的文献

Automated drug design for druggable target identification using integrated stacked autoencoder and hierarchically self-adaptive optimization.使用集成堆叠自动编码器和分层自适应优化进行可成药靶点识别的自动化药物设计

Sci Rep. 2025 Sep 1;15(1):32205. doi: 10.1038/s41598-025-18091-x.

DrugProtAI: A machine learning-driven approach for predicting protein druggability through feature engineering and robust partition-based ensemble methods.DrugProtAI：一种通过特征工程和基于稳健划分的集成方法来预测蛋白质可成药性的机器学习驱动方法。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf330.

A genetic algorithm-based ensemble model for efficiently identifying interleukin 6 inducing peptides.一种基于遗传算法的集成模型，用于高效识别白细胞介素6诱导肽。

Sci Rep. 2025 Jul 1;15(1):21213. doi: 10.1038/s41598-025-05491-2.

DrugTar improves druggability prediction by integrating large language models and gene ontologies.DrugTar通过整合大语言模型和基因本体来改善药物可及性预测。

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf360.

Repurposing FDA-Approved Drugs Against Potential Drug Targets Involved in Brain Inflammation Contributing to Alzheimer's Disease.重新利用美国食品药品监督管理局（FDA）批准的药物，针对参与导致阿尔茨海默病的脑部炎症的潜在药物靶点。

Targets (Basel). 2024 Dec;2(4):446-469. doi: 10.3390/targets2040025. Epub 2024 Dec 4.

Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.可成药蛋白的综合研究：从位置特异性得分矩阵到预训练语言模型

Int J Mol Sci. 2024 Apr 19;25(8):4507. doi: 10.3390/ijms25084507.

DPI_CDF: druggable protein identifier using cascade deep forest.DPI_CDF：基于级联深度森林的可成药性蛋白识别方法。

BMC Bioinformatics. 2024 Apr 5;25(1):145. doi: 10.1186/s12859-024-05744-3.

Empirical comparison and analysis of machine learning-based approaches for druggable protein identification.基于机器学习的可成药蛋白识别方法的实证比较与分析

EXCLI J. 2023 Aug 29;22:915-927. doi: 10.17179/excli2023-6410. eCollection 2023.

PINNED: identifying characteristics of druggable human proteins using an interpretable neural network.PINNED：使用可解释神经网络识别可成药人类蛋白质的特征

J Cheminform. 2023 Jul 19;15(1):64. doi: 10.1186/s13321-023-00735-7.

QuoteTarget: A sequence-based transformer protein language model to identify potentially druggable protein targets.基于序列的转化器蛋白语言模型，用于鉴定潜在可成药的蛋白靶标。

Protein Sci. 2023 Feb;32(2):e4555. doi: 10.1002/pro.4555.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于遗传算法和 Bagging-SVM 集成分类器的潜在可成药蛋白的准确预测。

Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献