Mballo Chérif, Makarenkov Vladimir
Département d'informatique, Université du Québec à Montréal, Montreal, Québec, Canada.
Comb Chem High Throughput Screen. 2010 Jun;13(5):430-41. doi: 10.2174/138620710791292958.
High-throughput screening (HTS) remains a very costly process notwithstanding many recent technological advances in the field of biotechnology. In this study we consider the application of machine learning methods for predicting experimental HTS measurements. Such a virtual HTS analysis can be based on the results of real HTS campaigns carried out with similar compounds libraries and similar drug targets. In this way, we analyzed Test assay from McMaster University Data Mining and Docking Competition using binary decision trees, neural networks, support vector machines (SVM), linear discriminant analysis, k-nearest neighbors and partial least squares. First, we studied separately the sets of molecular and atomic descriptors in order to establish which of them provides a better prediction. Then, the comparison of the six considered machine learning methods was made in terms of false positives and false negatives, method's sensitivity and enrichment factor. Finally, a variable selection procedure allowing one to improve the method's sensitivity was implemented and applied in the framework of polynomial SVM.
尽管生物技术领域最近有许多技术进步,但高通量筛选(HTS)仍然是一个非常昂贵的过程。在本研究中,我们考虑应用机器学习方法来预测高通量筛选实验测量结果。这种虚拟高通量筛选分析可以基于使用类似化合物库和类似药物靶点进行的实际高通量筛选活动的结果。通过这种方式,我们使用二叉决策树、神经网络、支持向量机(SVM)、线性判别分析、k近邻和偏最小二乘法分析了麦克马斯特大学数据挖掘与对接竞赛中的测试分析。首先,我们分别研究了分子和原子描述符集,以确定其中哪一个能提供更好的预测。然后,根据假阳性和假阴性、方法的灵敏度和富集因子对六种机器学习方法进行了比较。最后,实施了一种变量选择程序,以提高方法的灵敏度,并将其应用于多项式支持向量机框架中。