Silva Catarina, Ribeiro Bernardete
Departamento Eng. Informática, Universidade de Coimbra, Portugal.
Int J Neural Syst. 2008 Feb;18(1):45-58. doi: 10.1142/S0129065708001361.
In this paper we develop and analyze methods for expanding automated learning of Relevance Vector Machines (RVM) to large scale text sets. RVM rely on Bayesian inference learning and while maintaining state-of-the-art performance, offer sparse and probabilistic solutions. However, efforts towards applying RVM to large scale sets have met with limited success in the past, due to computational constraints. We propose a diversified set of divide-and-conquer approaches where decomposition techniques promote the definition of smaller working sets that permit the use of all training examples. The rationale is that by exploring incremental, ensemble and boosting strategies, it is possible to improve classification performance, taking advantage of the large training set available. Results on Reuters-21578 and RCV1 are presented, showing performance gains and maintaining sparse solutions that can be deployed in distributed environments.
在本文中,我们开发并分析了将相关向量机(RVM)的自动学习扩展到大规模文本集的方法。RVM依赖于贝叶斯推理学习,在保持一流性能的同时,提供稀疏且概率性的解决方案。然而,由于计算限制,过去将RVM应用于大规模数据集的努力取得的成功有限。我们提出了一组多样化的分治方法,其中分解技术促进了较小工作集的定义,从而允许使用所有训练示例。其基本原理是,通过探索增量、集成和增强策略,利用可用的大型训练集,可以提高分类性能。文中给出了在路透社-21578和RCV1数据集上的结果,展示了性能提升以及可在分布式环境中部署的稀疏解决方案的保持情况。