使用机器学习算法鉴定人类药物靶点。

Identification of human drug targets using machine-learning algorithms.

作者信息

Kumari Priyanka, Nath Abhigyan, Chaube Radha

机构信息

Bioinformatics Section, Mahila Mahavidyalaya, Banaras Hindu University, Varanasi 221005, India.

Zoology/Bioinformatic Section, Mahila Mahavidyalaya, Banaras Hindu University, Varanasi 221005, India.

出版信息

Comput Biol Med. 2015 Jan;56:175-81. doi: 10.1016/j.compbiomed.2014.11.008. Epub 2014 Nov 20.

DOI:10.1016/j.compbiomed.2014.11.008

PMID:25437231

Abstract

Identification of potential drug targets is a crucial task in the drug-discovery pipeline. Successful identification of candidate drug targets in entire genomes is very useful, and computational prediction methods can speed up this process. In the current work we have developed a sequence-based prediction method for the successful identification and discrimination of human drug target proteins, from human non-drug target proteins. The training features include sequence-based features, such as amino acid composition, amino acid property group composition, and dipeptide composition for generating predictive models. The classification of human drug target proteins presents a classic example of class imbalance. We have addressed this issue by using SMOTE (Synthetic Minority Over-sampling Technique) as a preprocessing step, for balancing the training data with a ratio of 1:1 between drug targets (minority samples) and non-drug targets (majority samples). Using ensemble classification learning method-Rotation Forest and ReliefF feature-selection technique for selecting the optimal subset of salient features, the best model with selected features can achieve 87.1% sensitivity, 83.6% specificity, and 85.3% accuracy, with 0.71 Matthews correlation coefficient (mcc) on a tenfold stratified cross-validation test. The subset of identified optimal features may help in assessing the compositional patterns in human drug targets. For further validation, using a rigorous leave-one-out cross-validation test, the model achieved 88.1% sensitivity, 83.0% specificity, 85.5% accuracy, and 0.712 mcc. The proposed method was tested on a second dataset, for which the current pipeline gave promising results. We suggest that the present approach can be applied successfully as a complementary tool to existing methods for novel drug target prediction.

摘要

识别潜在的药物靶点是药物研发流程中的一项关键任务。在整个基因组中成功识别候选药物靶点非常有用，而计算预测方法可以加速这一过程。在当前的工作中，我们开发了一种基于序列的预测方法，用于成功识别和区分人类药物靶点蛋白与人类非药物靶点蛋白。训练特征包括基于序列的特征，如氨基酸组成、氨基酸属性组组成以及用于生成预测模型的二肽组成。人类药物靶点蛋白的分类呈现出典型的类别不平衡示例。我们通过使用SMOTE（合成少数类过采样技术）作为预处理步骤来解决这个问题，以使训练数据中药物靶点（少数样本）和非药物靶点（多数样本）的比例达到1:1。使用集成分类学习方法——旋转森林和ReliefF特征选择技术来选择显著特征的最优子集，在十倍分层交叉验证测试中，具有所选特征的最佳模型可实现87.1%的灵敏度、83.6%的特异性和85.3%的准确率，马修斯相关系数（mcc）为0.71。所识别的最优特征子集可能有助于评估人类药物靶点中的组成模式。为了进一步验证，在严格的留一法交叉验证测试中，该模型实现了88.1%的灵敏度、83.0%的特异性、85.5%的准确率和0.712的mcc。该方法在第二个数据集上进行了测试，当前流程在该数据集上取得了有前景的结果。我们建议，本方法可作为现有新药靶点预测方法的补充工具成功应用。

相似文献

Identification of human drug targets using machine-learning algorithms.

Comput Biol Med. 2015 Jan;56:175-81. doi: 10.1016/j.compbiomed.2014.11.008. Epub 2014 Nov 20.

Unsupervised learning assisted robust prediction of bioluminescent proteins.

Comput Biol Med. 2016 Jan 1;68:27-36. doi: 10.1016/j.compbiomed.2015.10.013. Epub 2015 Nov 10.

A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.

Mol Inform. 2020 May;39(5):e1900062. doi: 10.1002/minf.201900062. Epub 2020 Feb 11.

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.

In Silico Biol. 2008;8(2):129-40.

Predicting network of drug-enzyme interaction based on machine learning method.

Biochim Biophys Acta. 2014 Jan;1844(1 Pt B):214-23. doi: 10.1016/j.bbapap.2013.07.008. Epub 2013 Jul 30.

Enhanced Prediction and Characterization of CDK Inhibitors Using Optimal Class Distribution.

Interdiscip Sci. 2017 Jun;9(2):292-303. doi: 10.1007/s12539-016-0151-1. Epub 2016 Feb 15.

Improved method for predicting beta-turn using support vector machine.

Bioinformatics. 2005 May 15;21(10):2370-4. doi: 10.1093/bioinformatics/bti358. Epub 2005 Mar 29.

Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC.

Comput Methods Programs Biomed. 2015 Nov;122(2):165-74. doi: 10.1016/j.cmpb.2015.07.005. Epub 2015 Jul 22.

TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples.

Bioinformatics. 2009 Oct 15;25(20):2625-31. doi: 10.1093/bioinformatics/btp503. Epub 2009 Aug 19.

A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

Int J Mol Sci. 2016 Feb 6;17(2):218. doi: 10.3390/ijms17020218.

引用本文的文献

DrugTar improves druggability prediction by integrating large language models and gene ontologies.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf360.

Predicting Calcein Release from Ultrasound-Targeted Liposomes: A Comparative Analysis of Random Forest and Support Vector Machine.

Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338241296725. doi: 10.1177/15330338241296725.

KLSD: a kinase database focused on ligand similarity and diversity.

Front Pharmacol. 2024 Jun 18;15:1400136. doi: 10.3389/fphar.2024.1400136. eCollection 2024.

Molecular Design of Novel Herbicide and Insecticide Seed Compounds with Machine Learning.

ACS Omega. 2024 Apr 9;9(16):18488-18494. doi: 10.1021/acsomega.4c00655. eCollection 2024 Apr 23.

OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features.

Front Genet. 2023 Apr 6;14:1139626. doi: 10.3389/fgene.2023.1139626. eCollection 2023.

Prediction of Drug Targets for Specific Diseases Leveraging Gene Perturbation Data: A Machine Learning Approach.

Pharmaceutics. 2022 Jan 20;14(2):234. doi: 10.3390/pharmaceutics14020234.

Systems biology and machine learning approaches identify drug targets in diabetic nephropathy.

Sci Rep. 2021 Dec 6;11(1):23452. doi: 10.1038/s41598-021-02282-3.

Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents.

Mol Divers. 2021 Aug;25(3):1517-1539. doi: 10.1007/s11030-021-10274-8. Epub 2021 Jul 19.

Inferring Relationship of Blood Metabolic Changes and Average Daily Gain With Feed Conversion Efficiency in Murrah Heifers: Machine Learning Approach.

Front Vet Sci. 2020 Sep 2;7:518. doi: 10.3389/fvets.2020.00518. eCollection 2020.

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.

Sci Rep. 2020 Jul 1;10(1):10787. doi: 10.1038/s41598-020-67846-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用机器学习算法鉴定人类药物靶点。

Identification of human drug targets using machine-learning algorithms.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献