Jasial Swarit, Balfer Jenny, Vogt Martin, Bajorath Jürgen
Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, 53113 Bonn, Germany tel: +49-228-2699-306; fax: +49-228-2699-341.
Mol Inform. 2015 Feb;34(2-3):127-33. doi: 10.1002/minf.201400163. Epub 2015 Feb 17.
Support vector machines (SVMs) are among the most popular machine learning methods for compound classification and other chemoinformatics tasks such as, for example, the prediction of ligand-target pairs or compound activity profiles. Depending on the specific applications, different SVM strategies can be used. For example, in the context of potency-directed virtual screening, linear combinations of multiple SVM models have been shown to enrich database selection sets with potent compounds compared to individual models. An open question concerning the use of SVM linear combinations (SVM-LCs) is how to best weight the models on a relative scale. Typically, linear weights are subjectively set. Herein, preferred weighting factors for SVM-LC were systematically determined. Therefore, weights were treated as meta-parameters and optimized by machine learning to enrich data set rankings with highly active compounds. The meta-parameter approach has been applied to 10 screening data sets and found to further improve SVM performance over other SVM-LCs and support vector regression (SVR) models. The results show that optimal weights depend on data set characteristics and chosen molecular representations. In addition, individual models often do not contribute to the performance of SVM-LCs. Taken together, these findings emphasize the need for systematic meta-parameter estimation.
支持向量机(SVM)是用于化合物分类和其他化学信息学任务(例如预测配体-靶点对或化合物活性谱)的最流行的机器学习方法之一。根据具体应用,可以使用不同的SVM策略。例如,在效价导向的虚拟筛选中,与单个模型相比,多个SVM模型的线性组合已被证明可以用强效化合物丰富数据库选择集。关于使用SVM线性组合(SVM-LC)的一个悬而未决的问题是如何在相对尺度上对模型进行最佳加权。通常,线性权重是主观设定的。在此,系统地确定了SVM-LC的优选加权因子。因此,权重被视为元参数,并通过机器学习进行优化,以用高活性化合物丰富数据集排名。元参数方法已应用于10个筛选数据集,发现与其他SVM-LC和支持向量回归(SVR)模型相比,它能进一步提高SVM的性能。结果表明,最佳权重取决于数据集特征和所选的分子表示。此外,单个模型通常对SVM-LC的性能没有贡献。综上所述,这些发现强调了系统进行元参数估计的必要性。