Byvatov Evgeny, Fechner Uli, Sadowski Jens, Schneider Gisbert
Institut für Organische Chemie und Chemische Biologie, Johann Wolfgang Goethe-Universität, Marie-Curie-Strasse 11, D-60439 Frankfurt, Germany.
J Chem Inf Comput Sci. 2003 Nov-Dec;43(6):1882-9. doi: 10.1021/ci0341161.
Support vector machine (SVM) and artificial neural network (ANN) systems were applied to a drug/nondrug classification problem as an example of binary decision problems in early-phase virtual compound filtering and screening. The results indicate that solutions obtained by SVM training seem to be more robust with a smaller standard error compared to ANN training. Generally, the SVM classifier yielded slightly higher prediction accuracy than ANN, irrespective of the type of descriptors used for molecule encoding, the size of the training data sets, and the algorithm employed for neural network training. The performance was compared using various different descriptor sets and descriptor combinations based on the 120 standard Ghose-Crippen fragment descriptors, a wide range of 180 different properties and physicochemical descriptors from the Molecular Operating Environment (MOE) package, and 225 topological pharmacophore (CATS) descriptors. For the complete set of 525 descriptors cross-validated classification by SVM yielded 82% correct predictions (Matthews cc = 0.63), whereas ANN reached 80% correct predictions (Matthews cc = 0.58). Although SVM outperformed the ANN classifiers with regard to overall prediction accuracy, both methods were shown to complement each other, as the sets of true positives, false positives (overprediction), true negatives, and false negatives (underprediction) produced by the two classifiers were not identical. The theory of SVM and ANN training is briefly reviewed.
支持向量机(SVM)和人工神经网络(ANN)系统被应用于药物/非药物分类问题,作为早期虚拟化合物筛选和过滤中二元决策问题的一个例子。结果表明,与ANN训练相比,SVM训练获得的解决方案似乎更稳健,标准误差更小。一般来说,无论用于分子编码的描述符类型、训练数据集的大小以及用于神经网络训练的算法如何,SVM分类器的预测准确率都略高于ANN。使用基于120个标准戈斯-克里平片段描述符、来自分子操作环境(MOE)软件包的180种不同性质和物理化学描述符以及225个拓扑药效团(CATS)描述符的各种不同描述符集和描述符组合对性能进行了比较。对于完整的525个描述符集,SVM交叉验证分类产生了82%的正确预测(马修斯相关系数cc = 0.63),而ANN达到了80%的正确预测(马修斯相关系数cc = 0.58)。尽管SVM在整体预测准确率方面优于ANN分类器,但两种方法被证明是互补的,因为两个分类器产生的真阳性、假阳性(过度预测)、真阴性和假阴性(预测不足)集并不相同。简要回顾了SVM和ANN训练的理论。