School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang 11800, Malaysia.
Molecules. 2023 Feb 9;28(4):1663. doi: 10.3390/molecules28041663.
The prediction of drug-target interactions (DTIs) is a vital step in drug discovery. The success of machine learning and deep learning methods in accurately predicting DTIs plays a huge role in drug discovery. However, when dealing with learning algorithms, the datasets used are usually highly dimensional and extremely imbalanced. To solve this issue, the dataset must be resampled accordingly. In this paper, we have compared several data resampling techniques to overcome class imbalance in machine learning methods as well as to study the effectiveness of deep learning methods in overcoming class imbalance in DTI prediction in terms of binary classification using ten (10) cancer-related activity classes from BindingDB. It is found that the use of Random Undersampling (RUS) in predicting DTIs severely affects the performance of a model, especially when the dataset is highly imbalanced, thus, rendering RUS unreliable. It is also found that SVM-SMOTE can be used as a go-to resampling method when paired with the Random Forest and Gaussian Naïve Bayes classifiers, whereby a high F1 score is recorded for all activity classes that are severely and moderately imbalanced. Additionally, the deep learning method called Multilayer Perceptron recorded high F1 scores for all activity classes even when no resampling method was applied.
药物-靶点相互作用(DTI)的预测是药物发现的重要步骤。机器学习和深度学习方法在准确预测 DTI 方面的成功在药物发现中起着巨大的作用。然而,在处理学习算法时,所使用的数据集通常是高度多维的且极其不平衡的。为了解决这个问题,必须相应地对数据集进行重采样。在本文中,我们比较了几种数据重采样技术,以克服机器学习方法中的类别不平衡问题,并研究深度学习方法在克服使用来自 BindingDB 的十个(10)癌症相关活性类别的二进制分类中的 DTI 预测中的类别不平衡问题方面的有效性。结果发现,在预测 DTI 时使用随机欠采样(RUS)严重影响模型的性能,尤其是当数据集高度不平衡时,因此,RUS 不可靠。还发现,当与随机森林和高斯朴素贝叶斯分类器配对使用时,SVM-SMOTE 可以作为首选的重采样方法,所有严重和中度不平衡的活性类别的 F1 得分都很高。此外,即使没有应用重采样方法,称为多层感知器的深度学习方法也记录了所有活性类别的高 F1 得分。