Tao Qingping, Scott Stephen D, Vinodchandran N V, Osugi Thomas Takeo, Mueller Brandon
GC Image, LLC, Lincoln, NE 68505, USA.
IEEE Trans Pattern Anal Mach Intell. 2008 Dec;30(12):2084-98. doi: 10.1109/TPAMI.2007.70846.
The multiple-instance learning (MIL) model has been successful in numerous application areas. Recently, a generalization of this model and an algorithm for it were introduced, showing significant advantages over the conventional MIL model on certain application areas. Unfortunately, that algorithm is not scalable to high dimensions. We adapt that algorithm to one using a support vector machine with our new kernel k\wedge. This reduces the time complexity from exponential in the dimension to polynomial. Computing our new kernel is equivalent to counting the number of boxes in a discrete, bounded space that contain at least one point from each of two multisets. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) for it. We then extend k\wedge by enriching its representation into a new kernel kmin, and also consider a normalized version of k\wedge that we call k\wedge/\vee (which may or may not not be a kernel, but whose approximation yielded positive semidefinite Gram matrices in practice). We then empirically evaluate all three measures on data from content-based image retrieval, biological sequence analysis, and the musk data sets. We found that our kernels performed well on all data sets relative to algorithms in the conventional MIL model.
多实例学习(MIL)模型在众多应用领域都取得了成功。最近,该模型的一种泛化形式及其算法被提出,在某些应用领域展现出相较于传统MIL模型的显著优势。不幸的是,该算法在高维情况下不可扩展。我们将该算法适配为使用带有新核函数(k^{\wedge})的支持向量机的算法。这将时间复杂度从维度的指数形式降低为多项式形式。计算我们的新核函数等同于计算一个离散有界空间中包含来自两个多重集各自至少一个点的盒子数量。我们证明这个问题是#P完全问题,但随后给出了一个针对它的完全多项式随机近似算法(FPRAS)。然后,我们通过将其表示丰富为新核函数(k_{min})来扩展(k^{\wedge}),并且还考虑了(k^{\wedge})的一个归一化版本,我们称之为(k^{\wedge}/\vee)(它可能是也可能不是一个核函数,但在实践中其近似产生了正定的格拉姆矩阵)。然后,我们基于基于内容的图像检索、生物序列分析和麝香数据集的数据对这三种度量进行实证评估。我们发现相对于传统MIL模型中的算法,我们的核函数在所有数据集上都表现良好。