Long Yuguang, Wang Limin, Sun Minghui
College of Software, Jilin University, Changchun 130012, China.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
Entropy (Basel). 2019 Jul 25;21(8):721. doi: 10.3390/e21080721.
Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback-Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary -dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may "best match" the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.
由于朴素贝叶斯(NB)方法具有简单性和具有竞争力的分类性能,研究人员提出了许多方法来通过弱化其属性独立性假设来改进NB。通过对库尔贝克-莱布勒散度的理论分析,NB及其变体之间的差异在于树形网络结构中这些增强边所表示的条件互信息的不同阶数。在本文中,我们建议通过将树增强朴素贝叶斯(TAN)从1-依赖贝叶斯网络分类器(BNC)进一步推广到任意依赖来放宽独立性假设。为分别表示特定条件依赖关系而构建的TAN子模型可能会“最佳匹配”训练数据上的条件概率分布。大量实验结果表明,所提出的算法实现了偏差-方差权衡,并且比诸如逻辑回归等现有分类器具有显著更好的泛化性能。