Liu Tian-Yu, Li Guo-Zheng, Yang Jack Y, Yang Mary Qu
School of Electric, Shanghai Dianji University, Shanghai, China.
Int J Comput Biol Drug Des. 2008;1(4):334-46. doi: 10.1504/ijcbdd.2008.022206.
Activities of drug molecules can be predicted by Quantitative Structure Activity Relationship (QSAR) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an imbalanced situation. Here we propose one embedded feature selection algorithm i.e., Prediction Risk based feature selection for EasyEnsemble (PREE) to treat this problem and improve generalisation performance of the EasyEnsemble classifier. Experimental results on the drug molecules data sets show that PREE obtains better performance, compared with the asymmetric bagging and EasyEnsemble.