Hao Ming, Li Yan, Wang Yonghua, Zhang Shuwei
School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116012, China; E-Mails:
Int J Mol Sci. 2011 Feb 21;12(2):1259-80. doi: 10.3390/ijms12021259.
Experimental pEC(50)s for 216 selective respiratory syncytial virus (RSV) inhibitors are used to develop classification models as a potential screening tool for a large library of target compounds. Variable selection algorithm coupled with random forests (VS-RF) is used to extract the physicochemical features most relevant to the RSV inhibition. Based on the selected small set of descriptors, four other widely used approaches, i.e., support vector machine (SVM), Gaussian process (GP), linear discriminant analysis (LDA) and k nearest neighbors (kNN) routines are also employed and compared with the VS-RF method in terms of several of rigorous evaluation criteria. The obtained results indicate that the VS-RF model is a powerful tool for classification of RSV inhibitors, producing the highest overall accuracy of 94.34% for the external prediction set, which significantly outperforms the other four methods with the average accuracy of 80.66%. The proposed model with excellent prediction capacity from internal to external quality should be important for screening and optimization of potential RSV inhibitors prior to chemical synthesis in drug development.
216种选择性呼吸道合胞病毒(RSV)抑制剂的实验性pEC(50)值被用于开发分类模型,作为一个针对大量目标化合物库的潜在筛选工具。结合随机森林的变量选择算法(VS-RF)被用于提取与RSV抑制最相关的物理化学特征。基于选定的一小套描述符,还采用了其他四种广泛使用的方法,即支持向量机(SVM)、高斯过程(GP)、线性判别分析(LDA)和k近邻(kNN)程序,并根据几个严格的评估标准与VS-RF方法进行比较。所得结果表明,VS-RF模型是一种用于RSV抑制剂分类的强大工具,对外预测集产生了94.34%的最高总体准确率,显著优于其他四种平均准确率为80.66%的方法。所提出的从内部到外部质量都具有出色预测能力的模型,对于在药物开发中进行化学合成之前筛选和优化潜在的RSV抑制剂应该很重要。