Oneto Luca, Navarin Nicolo, Donini Michele, Ridella Sandro, Sperduti Alessandro, Aiolli Fabio, Anguita Davide
IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4660-4671. doi: 10.1109/TNNLS.2017.2771830. Epub 2017 Dec 4.
When dealing with kernel methods, one has to decide which kernel and which values for the hyperparameters to use. Resampling techniques can address this issue but these procedures are time-consuming. This problem is particularly challenging when dealing with structured data, in particular with graphs, since several kernels for graph data have been proposed in literature, but no clear relationship among them in terms of learning properties is defined. In these cases, exhaustive search seems to be the only reasonable approach. Recently, the global Rademacher complexity (RC) and local Rademacher complexity (LRC), two powerful measures of the complexity of a hypothesis space, have shown to be suited for studying kernels properties. In particular, the LRC is able to bound the generalization error of an hypothesis chosen in a space by disregarding those ones which will not be taken into account by any learning procedure because of their high error. In this paper, we show a new approach to efficiently bound the RC of the space induced by a kernel, since its exact computation is an NP-Hard problem. Then we show for the first time that RC can be used to estimate the accuracy and expressivity of different graph kernels under different parameter configurations. The authors' claims are supported by experimental results on several real-world graph data sets.
在处理核方法时,必须决定使用哪种核以及超参数的哪些值。重采样技术可以解决这个问题,但这些过程很耗时。当处理结构化数据,特别是图形时,这个问题尤其具有挑战性,因为文献中已经提出了几种用于图形数据的核,但它们在学习属性方面没有明确的关系。在这些情况下,穷举搜索似乎是唯一合理的方法。最近,全局拉德马赫复杂度(RC)和局部拉德马赫复杂度(LRC)这两种强大的假设空间复杂度度量,已被证明适用于研究核属性。特别是,LRC能够通过忽略那些由于误差高而不会被任何学习过程考虑的假设,来界定在一个空间中选择的假设的泛化误差。在本文中,我们展示了一种有效界定由核诱导的空间的RC的新方法,因为其精确计算是一个NP难问题。然后我们首次表明,RC可用于估计不同参数配置下不同图形核的准确性和表现力。作者的主张得到了在几个真实世界图形数据集上的实验结果的支持。