Laboratory of Signal Processing and Speech Communication, Department of Electrical Engineering, Graz University of Technology, Inffeldgasse 16c, Graz A-8010, Austria.
IEEE Trans Pattern Anal Mach Intell. 2012 Mar;34(3):521-32. doi: 10.1109/TPAMI.2011.149.
We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.
我们提出了一种基于最大间隔参数学习算法的贝叶斯网络分类器,使用共轭梯度(CG)方法进行优化。与以前的方法不同,我们在优化过程中保持了贝叶斯网络参数的归一化约束,即不会丢失模型的概率解释。这使我们能够处理判别优化贝叶斯网络中的缺失特征。在实验中,我们将最大间隔参数学习的分类性能与条件似然和最大似然学习方法进行了比较。在所有考虑的数据集中,判别参数学习在朴素贝叶斯和树增强朴素贝叶斯结构上显著优于生成最大似然估计。此外,在大多数情况下,最大间隔方法在分类性能方面优于条件似然方法。我们还提供了基于凸松弛的最近提出的最大间隔优化方法的结果。虽然分类结果高度相似,但我们基于 CG 的优化在计算上要快几个数量级。使用较少的参数,最大间隔优化的贝叶斯网络分类器可以实现与支持向量机(SVM)相当的分类性能。此外,我们还表明,在分类过程中意外出现的缺失特征值可以很容易地被判别优化的贝叶斯网络分类器处理,而在这种情况下,判别分类器通常需要首先完成数据中未知特征值的机制。