MotieGhader Habib, Gharaghani Sajjad, Masoudi-Sobhanzadeh Yosef, Masoudi-Nejad Ali
Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Iran J Pharm Res. 2017 Spring;16(2):533-553.
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.
特征选择在定量构效关系(QSAR)分析中至关重要。这个问题已经通过一些元启发式算法得以解决,如遗传算法(GA)、粒子群优化算法(PSO)、蚁群优化算法(ACO)等。在这项工作中,提出了两种新颖的混合元启发式算法,即基于遗传算法和学习自动机的顺序遗传算法与学习自动机(SGALA)以及混合遗传算法与学习自动机(MGALA),用于QSAR特征选择。SGALA算法依次利用遗传算法和学习自动机的优势,而MGALA算法同时利用遗传算法和学习自动机的优势。我们将所提出的算法应用于从三个不同的数据集中选择尽可能少的特征数量,并且我们观察到,与其他特征选择算法相比,MGALA和SGALA算法各自以及平均而言都具有最佳结果。通过对所提出算法的比较,我们推断出MGALA和SGALA算法收敛到最优结果的速率优于遗传算法、蚁群优化算法、粒子群优化算法和学习自动机算法。最后,将遗传算法、蚁群优化算法、粒子群优化算法、学习自动机算法、SGALA算法和MGALA算法的结果作为最小二乘支持向量回归(LS - SVR)模型的输入,LS - SVR模型的结果表明,与来自所有其他提及算法的输入相比,LS - SVR模型在输入来自SGALA和MGALA算法时具有更强的预测能力。因此,可以证实所提出算法不仅预测效率更高,而且其收敛速率也优于所有其他提及算法。