Ganguly Milan, Brown Nathan, Schuffenhauer Ansgar, Ertl Peter, Gillet Valerie J, Greenidge Paulette A
Novartis Institutes for BioMedical Research, Basel, CH-4002, Switzerland.
J Chem Inf Model. 2006 Sep-Oct;46(5):2110-24. doi: 10.1021/ci050529l.
An evolutionary statistical learning method was applied to classify drugs according to their biological target and also to discriminate between a compilation of oral and nonoral drugs. The emphasis was placed not only on how well the models predict but also on their interpretability. In an enhancement to previous studies, the consistency of the model weights over several runs of the genetic algorithm was considered with the goal of producing comprehensible models. Via this approach, the descriptors and their ranges that contribute most to class discrimination were identified. Selecting a bin step size that enables the average descriptor properties of the class being trained to be captured improves the interpretability and discriminatory power of a model. The performance, consistency, and robustness of such models were further enhanced by using two novel approaches that reduce the variability between individual solutions: consensus and splice modeling. Finally, the ability of the genetic algorithm to discriminate between activity classes was compared with a similarity searching method, while naïve Bayes classifiers and support vector machines were applied in discriminating the oral and nonoral drugs.
一种进化统计学习方法被用于根据药物的生物靶点对药物进行分类,同时区分口服药物和非口服药物的汇编。重点不仅在于模型预测的准确性,还在于其可解释性。与之前的研究相比,该研究考虑了遗传算法多次运行中模型权重的一致性,目的是生成可理解的模型。通过这种方法,确定了对类别区分贡献最大的描述符及其范围。选择一个能够捕获所训练类别的平均描述符属性的箱步长,可以提高模型的可解释性和区分能力。通过使用两种减少个体解决方案之间变异性的新方法:共识建模和拼接建模,进一步提高了此类模型的性能、一致性和稳健性。最后,将遗传算法区分活性类别的能力与相似性搜索方法进行了比较,同时应用朴素贝叶斯分类器和支持向量机来区分口服药物和非口服药物。