Department of Computer Science & Technology, Tongji University, Shanghai 201804, China.
Institutes of Physical Science and Information Technology & School of Internet, Anhui University, Hefei 230601, China.
Int J Mol Sci. 2021 Jun 20;22(12):6598. doi: 10.3390/ijms22126598.
Backgroud: The prediction of drug-target interactions (DTIs) is of great significance in drug development. It is time-consuming and expensive in traditional experimental methods. Machine learning can reduce the cost of prediction and is limited by the characteristics of imbalanced datasets and problems of essential feature selection.
The prediction method based on the Ensemble model of Multiple Feature Pairs (Ensemble-MFP) is introduced. Firstly, three negative sets are generated according to the Euclidean distance of three feature pairs. Then, the negative samples of the validation set/test set are randomly selected from the union set of the three negative sets in the validation set/test set. At the same time, the ensemble model with weight is optimized and applied to the test set.
The area under the receiver operating characteristic curve (area under ROC, AUC) in three out of four sub-datasets in gold standard datasets was more than 94.0% in the prediction of new drugs. The effectiveness of the proposed method is also shown with the comparison of state-of-the-art methods and demonstration of predicted drug-target pairs.
The Ensemble-MFP can weigh the existing feature pairs and has a good prediction effect for general prediction on new drugs.
背景:药物-靶点相互作用(DTIs)的预测在药物开发中具有重要意义。传统的实验方法耗时且昂贵。机器学习可以降低预测成本,但受到不平衡数据集的特征和基本特征选择问题的限制。
引入了基于多重特征对集成模型(Ensemble-MFP)的预测方法。首先,根据三个特征对的欧式距离生成三个负样本集。然后,从验证集/测试集中三个负样本集的并集中随机选择验证集/测试集的负样本。同时,优化具有权重的集成模型并将其应用于测试集。
在金标准数据集中的四个子数据集的三个中,接收器工作特征曲线下的面积(ROC 下面积,AUC)在预测新药时均超过 94.0%。通过与最先进方法的比较和预测药物-靶点对的演示,也证明了所提出方法的有效性。
Ensemble-MFP 可以对现有特征对进行加权,并且对新药的一般预测具有良好的预测效果。