支持向量归纳逻辑编程在生物活性化合物分类方面优于朴素贝叶斯分类器和归纳逻辑编程。

Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds.

作者信息

Cannon Edward O, Amini Ata, Bender Andreas, Sternberg Michael J E, Muggleton Stephen H, Glen Robert C, Mitchell John B O

机构信息

Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK.

出版信息

J Comput Aided Mol Des. 2007 May;21(5):269-80. doi: 10.1007/s10822-007-9113-3. Epub 2007 Mar 27.

DOI:10.1007/s10822-007-9113-3

PMID:17387437

Abstract

We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.

摘要

我们研究了圆形指纹与朴素贝叶斯分类器（MP2D）、归纳逻辑编程（ILP）和支持向量归纳逻辑编程（SVILP）相结合在一个包含11个活性类别和约102,000个结构的标准分子基准数据集上的分类性能。朴素贝叶斯分类器独立处理特征，而ILP则组合结构片段，然后创建具有更高预测能力的新特征。SVILP是一种最近提出的方法，它在常见的ILP程序之后添加了一个支持向量机。通过多种统计指标来评估这些方法的性能，即召回率、特异性、精度、F值、马修斯相关系数、受试者工作特征（ROC）曲线下面积和富集因子（EF）。根据兼顾召回率和精度的F值，SVILP在11个类别中的7个类别上是 superior方法。结果表明，贝叶斯分类器在11个目标中的8个目标上具有最佳的召回性能，但精度、特异性和F值要低得多。另一方面，SVILP模型仅在11个类别中的3个类别上具有最高召回率，但通常具有远高于其他方法的特异性和精度。为了评估SVILP优越性的统计显著性，我们采用了麦克尼马尔检验（McNemar's test），结果表明，对于11个活性类别中的6个类别，SVILP的性能显著优于其他两种方法（p < 5%），而对于其余3个类别则具有较小的优越性。虽然之前贝叶斯分类器在分子分类研究中表现出色，但这些结果表明，SVILP能够从数据中提取额外的知识，从而进一步提高分类结果。

相似文献

Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds.

J Comput Aided Mol Des. 2007 May;21(5):269-80. doi: 10.1007/s10822-007-9113-3. Epub 2007 Mar 27.

A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming.

Proteins. 2007 Dec 1;69(4):823-31. doi: 10.1002/prot.21782.

A novel logic-based approach for quantitative toxicology prediction.

J Chem Inf Model. 2007 May-Jun;47(3):998-1006. doi: 10.1021/ci600223d. Epub 2007 Apr 24.

Discovering rules for protein-ligand specificity using support vector inductive logic programming.

Protein Eng Des Sel. 2009 Sep;22(9):561-7. doi: 10.1093/protein/gzp035. Epub 2009 Jul 2.

In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt window.

J Chem Inf Model. 2013 Aug 26;53(8):1957-66. doi: 10.1021/ci300435j. Epub 2013 Jul 24.

"Bayes affinity fingerprints" improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept?

J Chem Inf Model. 2006 Nov-Dec;46(6):2445-56. doi: 10.1021/ci600197y.

Automated validation of patient safety clinical incident classification: macro analysis.

Stud Health Technol Inform. 2013;188:52-7.

Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography.

Int J Environ Res Public Health. 2020 Sep 4;17(18):6449. doi: 10.3390/ijerph17186449.

Developing novel computational prediction models for assessing chemical-induced neurotoxicity using naïve Bayes classifier technique.

Food Chem Toxicol. 2020 Sep;143:111513. doi: 10.1016/j.fct.2020.111513. Epub 2020 Jul 1.

Developing novel in silico prediction models for assessing chemical reproductive toxicity using the naïve Bayes classifier method.

J Appl Toxicol. 2020 Sep;40(9):1198-1209. doi: 10.1002/jat.3975. Epub 2020 Mar 23.

引用本文的文献

PyPLIF HIPPOS-Assisted Prediction of Molecular Determinants of Ligand Binding to Receptors.

Molecules. 2021 Apr 22;26(9):2452. doi: 10.3390/molecules26092452.

Incorporating Virtual Reactions into a Logic-based Ligand-based Virtual Screening Method to Discover New Leads.

Mol Inform. 2015 Sep;34(9):615-625. doi: 10.1002/minf.201400162. Epub 2015 Mar 20.

Machine learning methods in chemoinformatics.

Wiley Interdiscip Rev Comput Mol Sci. 2014 Sep 1;4(5):468-481. doi: 10.1002/wcms.1183.

The influence of negative training set size on machine learning-based virtual screening.

J Cheminform. 2014 Jun 11;6:32. doi: 10.1186/1758-2946-6-32. eCollection 2014.

Enzyme informatics.

Curr Top Med Chem. 2012;12(17):1911-23. doi: 10.2174/156802612804547353.

Quantitative comparison of catalytic mechanisms and overall reactions in convergently evolved enzymes: implications for classification of enzyme function.

PLoS Comput Biol. 2010 Mar 12;6(3):e1000700. doi: 10.1371/journal.pcbi.1000700.

Discovering rules for protein-ligand specificity using support vector inductive logic programming.

Protein Eng Des Sel. 2009 Sep;22(9):561-7. doi: 10.1093/protein/gzp035. Epub 2009 Jul 2.

A novel hybrid ultrafast shape descriptor method for use in virtual screening.

Chem Cent J. 2008 Feb 18;2:3. doi: 10.1186/1752-153X-2-3.

本文引用的文献

Chapter 9 Molecular Similarity: Advances in Methods, Applications and Validations in Virtual Screening and QSAR.

Annu Rep Comput Chem. 2006;2:141-168. doi: 10.1016/S1574-1400(06)02009-3. Epub 2006 Nov 7.

Note on the sampling error of the difference between correlated proportions or percentages.

Psychometrika. 1947 Jun;12(2):153-7. doi: 10.1007/BF02295996.

Chemoinformatics-based classification of prohibited substances employed for doping in sport.

J Chem Inf Model. 2006 Nov-Dec;46(6):2369-80. doi: 10.1021/ci0601160.

Representation of molecular structure using quantum topology with inductive logic programming in structure-activity relationships.

J Comput Aided Mol Des. 2006 Jun;20(6):361-73. doi: 10.1007/s10822-006-9058-y. Epub 2006 Oct 13.

The Blue Obelisk-interoperability in chemical informatics.

J Chem Inf Model. 2006 May-Jun;46(3):991-8. doi: 10.1021/ci050400b.

Characterizing bitterness: identification of key structural features and development of a classification model.

J Chem Inf Model. 2006 Mar-Apr;46(2):569-76. doi: 10.1021/ci0504418.

Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME.

IDrugs. 2006 Mar;9(3):199-204.

A discussion of measures of enrichment in virtual screening: comparing the information content of descriptors with increasing levels of sophistication.

J Chem Inf Model. 2005 Sep-Oct;45(5):1369-75. doi: 10.1021/ci0500177.

Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures.

Org Biomol Chem. 2004 Nov 21;2(22):3256-66. doi: 10.1039/B409865J. Epub 2004 Sep 29.

Org Biomol Chem. 2004 Nov 21;2(22):3204-18. doi: 10.1039/B409813G. Epub 2004 Oct 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

支持向量归纳逻辑编程在生物活性化合物分类方面优于朴素贝叶斯分类器和归纳逻辑编程。

Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献