Arabi Nooshin, Torabi Mohammad Reza, Ghasemi Fahimeh
Department of Bioelectric, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
Adv Biomed Res. 2024 Jul 29;13:52. doi: 10.4103/abr.abr_179_23. eCollection 2024.
Nowadays, with the increasing prevalence of cancer mortality, finding the best cancer inhibitors is vital. Angiogenesis, which refers to the formation of new blood vessels from existing ones, undergoes abnormal changes in the physiological process of solid tumors. Vascular endothelial growth factor receptor (VEGFR) plays a crucial role in angiogenesis. Hence, one of the suggestions in cancer treatment has been inhibiting VEGFR signaling to prevent angiogenesis. The computational approach as an alternative method is crucial to reduce time and cost. This study aimed to use classification algorithm to separate potent inhibitors from inactive ones.
In order to apply the machine learning model, biological compounds were extracted from the BindingDB database. Due to the large number of molecular features, the classification model was susceptible to overfitting. To address this issue, a correlation-based feature selection algorithm was proposed as a means of feature reduction. Subsequently, for the classification step, a support vector machine model that utilizes both linear and non-linear kernels was employed.
The implementation of the support vector machine model with the radial basis function kernel, along with the correlation-based feature selection method, resulted in a higher accuracy (81.8%, value = 0.008) compared to other feature selection methods used in this study. Finally, two structures were introduced with the highest binding affinity to inhibit the second VEGFR.
According to the results, the correlation-based feature selection method is more accurate than other methods.
如今,随着癌症死亡率的日益上升,找到最佳的癌症抑制剂至关重要。血管生成是指从现有血管形成新血管的过程,在实体瘤的生理过程中会发生异常变化。血管内皮生长因子受体(VEGFR)在血管生成中起关键作用。因此,癌症治疗的建议之一是抑制VEGFR信号传导以防止血管生成。计算方法作为一种替代方法,对于减少时间和成本至关重要。本研究旨在使用分类算法将强效抑制剂与非活性抑制剂区分开来。
为了应用机器学习模型,从BindingDB数据库中提取生物化合物。由于分子特征数量众多,分类模型容易出现过拟合。为了解决这个问题,提出了一种基于相关性的特征选择算法作为特征约简的手段。随后,对于分类步骤,采用了一种利用线性和非线性核的支持向量机模型。
与本研究中使用的其他特征选择方法相比,采用径向基函数核的支持向量机模型以及基于相关性的特征选择方法,得到了更高的准确率(81.8%,P值 = 0.008)。最后,引入了两种具有最高结合亲和力的结构来抑制第二种VEGFR。
根据结果,基于相关性的特征选择方法比其他方法更准确。