Suppr超能文献

基于包装特征选择和类别平衡的药物-靶标相互作用预测的机器学习方法。

A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.

机构信息

Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India.

Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, K.K.Birla Goa Campus, 403726, Zuarinagar, Goa, -India.

出版信息

Mol Inform. 2020 May;39(5):e1900062. doi: 10.1002/minf.201900062. Epub 2020 Feb 11.

Abstract

Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.

摘要

药物-靶点相互作用(DTI)在药物发现、药物重定位和了解药物副作用方面发挥着至关重要的作用,有助于确定各种疾病的新治疗方案。然而,基因组和药物数据的指数级增长使得难以识别药物和靶点之间的新关联。因此,我们使用计算方法来加速 DTI 的识别过程。通常,使用包含已知 DTI 的可用数据驱动源来训练分类器以预测新的 DTI。此类数据集通常面临类别不平衡的问题。因此,在这项研究中,我们解决了此类数据集面临的两个挑战,即类别不平衡和高维性,以开发用于 DTI 预测的预测模型。该研究针对四种蛋白质类别进行,即酶、离子通道、G 蛋白偶联受体(GPCR)和核受体。我们使用二肽组成和药物的分子描述符对靶蛋白序列进行编码。使用包装特征选择和合成少数过采样技术(SMOTE)的机器学习方法来预测 DTI。集成方法在最佳情况下实现了 95.9%、93.4%、90.8%和 90.6%的准确性,以及 96.3%、92.8%、90.1%和 90.2%的精度,分别在不包含 SMOTE 样本的情况下,使用 10 倍交叉验证对酶、离子通道、GPCR 和核受体数据集进行评估。此外,我们的方法可以预测未包含在训练数据集中的新的药物-靶点相互作用。使用包装特征选择选择的特征可能对于理解本研究中蛋白质类别的 DTI 很重要。基于我们的评估,所提出的方法可用于理解和识别新的药物-靶点相互作用。我们为读者提供了一个可在 https://github.com/shwetagithub1/predDTI 上获得的独立软件包,该软件包将能够为用户提供新查询的 DTI 对的 DTI 预测。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验