PreDTIs：基于梯度提升框架使用数据平衡和特征选择技术，基于多种特征信息预测药物-靶标相互作用。

PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques.

机构信息

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab046.

DOI:10.1093/bib/bbab046

PMID:33709119

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7989622/

Abstract

Discovering drug-target (protein) interactions (DTIs) is of great significance for researching and developing novel drugs, having a tremendous advantage to pharmaceutical industries and patients. However, the prediction of DTIs using wet-lab experimental methods is generally expensive and time-consuming. Therefore, different machine learning-based methods have been developed for this purpose, but there are still substantial unknown interactions needed to discover. Furthermore, data imbalance and feature dimensionality problems are a critical challenge in drug-target datasets, which can decrease the classifier performances that have not been significantly addressed yet. This paper proposed a novel drug-target interaction prediction method called PreDTIs. First, the feature vectors of the protein sequence are extracted by the pseudo-position-specific scoring matrix (PsePSSM), dipeptide composition (DC) and pseudo amino acid composition (PseAAC); and the drug is encoded with MACCS substructure fingerings. Besides, we propose a FastUS algorithm to handle the class imbalance problem and also develop a MoIFS algorithm to remove the irrelevant and redundant features for getting the best optimal features. Finally, balanced and optimal features are provided to the LightGBM Classifier to identify DTIs, and the 5-fold CV validation test method was applied to evaluate the prediction ability of the proposed method. Prediction results indicate that the proposed model PreDTIs is significantly superior to other existing methods in predicting DTIs, and our model could be used to discover new drugs for unknown disorders or infections, such as for the coronavirus disease 2019 using existing drugs compounds and severe acute respiratory syndrome coronavirus 2 protein sequences.

摘要

发现药物-靶标（蛋白质）相互作用（DTIs）对于研究和开发新型药物具有重要意义，这对制药行业和患者都有巨大的优势。然而，使用湿实验室实验方法预测 DTIs 通常既昂贵又耗时。因此，已经开发了不同的基于机器学习的方法来实现这一目的，但仍有大量未知的相互作用需要发现。此外，数据不平衡和特征维度问题是药物-靶标数据集的一个关键挑战，这会降低分类器的性能，但尚未得到显著解决。本文提出了一种名为 PreDTIs 的新型药物-靶标相互作用预测方法。首先，通过伪位置特异性评分矩阵（PsePSSM）、二肽组成（DC）和伪氨基酸组成（PseAAC）提取蛋白质序列的特征向量，并使用 MACCS 子结构指纹对药物进行编码。此外，我们提出了一种 FastUS 算法来处理类不平衡问题，并开发了一种 MoIFS 算法来去除不相关和冗余的特征，以获得最佳的最优特征。最后，将平衡和最优的特征提供给 LightGBM 分类器以识别 DTIs，并应用 5 折 CV 验证测试方法来评估所提出方法的预测能力。预测结果表明，所提出的模型 PreDTIs 在预测 DTIs 方面明显优于其他现有方法，并且我们的模型可用于发现未知疾病或感染的新药，例如使用现有药物化合物和严重急性呼吸综合征冠状病毒 2 蛋白序列来治疗 2019 年冠状病毒病。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

PreDTIs：基于梯度提升框架使用数据平衡和特征选择技术，基于多种特征信息预测药物-靶标相互作用。

PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques.

机构信息

出版信息

相似文献

引用本文的文献

相似文献

引用本文的文献

PreDTIs：基于梯度提升框架使用数据平衡和特征选择技术，基于多种特征信息预测药物-靶标相互作用。

PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques.

机构信息

出版信息