Unilever Centre for Molecular Science Informatics, Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, UK.
J Cheminform. 2013 Aug 23;5(1):37. doi: 10.1186/1758-2946-5-37.
In the last decade the standard Naive Bayes (SNB) algorithm has been widely employed in multi-class classification problems in cheminformatics. This popularity is mainly due to the fact that the algorithm is simple to implement and in many cases yields respectable classification results. Using clever heuristic arguments "anchored" by insightful cheminformatics knowledge, Xia et al. have simplified the SNB algorithm further and termed it the Laplacian Corrected Modified Naive Bayes (LCMNB) approach, which has been widely used in cheminformatics since its publication.In this note we mathematically illustrate the conditions under which Xia et al.'s simplification holds. It is our hope that this clarification could help Naive Bayes practitioners in deciding when it is appropriate to employ the LCMNB algorithm to classify large chemical datasets.
A general formulation that subsumes the simplified Naive Bayes version is presented. Unlike the widely used NB method, the Standard Naive Bayes description presented in this work is discriminative (not generative) in nature, which may lead to possible further applications of the SNB method.
Starting from a standard Naive Bayes (SNB) algorithm, we have derived mathematically the relationship between Xia et al.'s ingenious, but heuristic algorithm, and the SNB approach. We have also demonstrated the conditions under which Xia et al.'s crucial assumptions hold. We therefore hope that the new insight and recommendations provided can be found useful by the cheminformatics community.
在过去的十年中,朴素贝叶斯(SNB)算法已被广泛应用于化学信息学中的多类分类问题。这种普及主要归因于该算法易于实现,并且在许多情况下都能产生令人满意的分类结果。夏等人通过巧妙的启发式论证和深入的化学信息学知识,进一步简化了 SNB 算法,并将其命名为拉普拉斯校正改进朴素贝叶斯(LCMNB)方法,自发表以来,该方法已在化学信息学中得到广泛应用。在本说明中,我们从数学上说明了夏等人的简化成立的条件。我们希望这一澄清可以帮助朴素贝叶斯从业者在决定何时适用于分类大型化学数据集时使用 LCMNB 算法。
提出了一种包含简化朴素贝叶斯版本的通用公式。与广泛使用的 NB 方法不同,本工作中提出的标准朴素贝叶斯描述具有判别性(不是生成性),这可能导致 SNB 方法的进一步应用。
从标准朴素贝叶斯(SNB)算法出发,我们从数学上推导出了夏等人巧妙但启发式算法与 SNB 方法之间的关系。我们还证明了夏等人的关键假设成立的条件。因此,我们希望提供的新见解和建议能对化学信息学界有所帮助。