Guo Zhigao, Constantinou Anthony C
Bayesian Artificial Intelligence Research Lab, School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK.
The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK.
Entropy (Basel). 2020 Oct 10;22(10):1142. doi: 10.3390/e22101142.
Score-based algorithms that learn Bayesian Network (BN) structures provide solutions ranging from different levels of approximate learning to exact learning. Approximate solutions exist because exact learning is generally not applicable to networks of moderate or higher complexity. In general, approximate solutions tend to sacrifice accuracy for speed, where the aim is to minimise the loss in accuracy and maximise the gain in speed. While some approximate algorithms are optimised to handle thousands of variables, these algorithms may still be unable to learn such high dimensional structures. Some of the most efficient score-based algorithms cast the structure learning problem as a combinatorial optimisation of candidate parent sets. This paper explores a strategy towards pruning the size of candidate parent sets, and which could form part of existing score-based algorithms as an additional pruning phase aimed at high dimensionality problems. The results illustrate how different levels of pruning affect the learning speed relative to the loss in accuracy in terms of model fitting, and show that aggressive pruning may be required to produce approximate solutions for high complexity problems.
基于分数的算法通过学习贝叶斯网络(BN)结构,提供了从不同层次的近似学习到精确学习的解决方案。之所以存在近似解,是因为精确学习通常不适用于中等或更高复杂度的网络。一般来说,近似解往往会为了速度而牺牲准确性,其目的是将准确性的损失降至最低,并将速度的提升最大化。虽然一些近似算法经过优化可以处理数千个变量,但这些算法可能仍然无法学习如此高维的结构。一些最高效的基于分数的算法将结构学习问题转化为候选父集的组合优化问题。本文探讨了一种策略,用于缩减候选父集的规模,该策略可作为现有基于分数的算法的一部分,作为针对高维问题的额外剪枝阶段。结果表明,不同程度的剪枝在模型拟合方面如何影响相对于准确性损失的学习速度,并表明对于高复杂度问题,可能需要进行激进剪枝才能产生近似解。