Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
J Chem Inf Model. 2010 May 24;50(5):716-31. doi: 10.1021/ci9003865.
With chemical libraries increasingly containing millions of compounds or more, there is a fast-growing need for computational methods that can rank or prioritize compounds for screening. Machine learning methods have shown considerable promise for this task; indeed, classification methods such as support vector machines (SVMs), together with their variants, have been used in virtual screening to distinguish active compounds from inactive ones, while regression methods such as partial least-squares (PLS) and support vector regression (SVR) have been used in quantitative structure-activity relationship (QSAR) analysis for predicting biological activities of compounds. Recently, a new class of machine learning methods - namely, ranking methods, which are designed to directly optimize ranking performance - have been developed for ranking tasks such as web search that arise in information retrieval (IR) and other applications. Here we report the application of these new ranking methods in machine learning to the task of ranking chemical structures. Our experiments show that the new ranking methods give better ranking performance than both classification based methods in virtual screening and regression methods in QSAR analysis. We also make some interesting connections between ranking performance measures used in cheminformatics and those used in IR studies.
随着化学库中化合物的数量越来越多,达到数百万甚至更多,因此对于能够对化合物进行排序或优先级划分以便进行筛选的计算方法的需求也在快速增长。机器学习方法在这项任务中显示出了相当大的前景;事实上,分类方法,如支持向量机(SVM)及其变体,已被用于虚拟筛选,以区分活性化合物和非活性化合物,而回归方法,如偏最小二乘(PLS)和支持向量回归(SVR),已被用于定量构效关系(QSAR)分析,以预测化合物的生物活性。最近,一类新的机器学习方法——即排序方法,旨在直接优化排序性能——已经被开发出来,用于解决信息检索(IR)和其他应用中出现的网络搜索等排序任务。在这里,我们报告了这些新的排序方法在机器学习中应用于化学结构排序任务的情况。我们的实验表明,这些新的排序方法在虚拟筛选中的分类方法和 QSAR 分析中的回归方法的排序性能都要好。我们还在化学信息学中使用的排序性能度量和 IR 研究中使用的排序性能度量之间建立了一些有趣的联系。