Suppr超能文献

稀疏随机特征图的项集核。

Sparse random feature maps for the item-multiset kernel.

机构信息

Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan.

Faculty of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan.

出版信息

Neural Netw. 2021 Nov;143:500-514. doi: 10.1016/j.neunet.2021.06.024. Epub 2021 Jul 1.

Abstract

Random feature maps are a promising tool for large-scale kernel methods. Since most random feature maps generate dense random features causing memory explosion, it is hard to apply them to very-large-scale sparse datasets. The factorization machines and related models, which use feature combinations efficiently, scale well for large-scale sparse datasets and have been used in many applications. However, their optimization problems are typically non-convex. Therefore, although they are optimized by using gradient-based iterative methods, such methods cannot find global optimum solutions in general and require a large number of iterations for convergence. In this paper, we define the item-multiset kernel, which is a generalization of the itemset kernel and dot product kernels. Unfortunately, random feature maps for the itemset kernel and dot product kernels cannot approximate the item-multiset kernel. We thus develop a method that converts an item-multiset kernel into an itemset kernel, enabling the item-multiset kernel to be approximated by using a random feature map for the itemset kernel. We propose two random feature maps for the itemset kernel, which run faster and are more memory efficient than the existing feature map for the itemset kernel. They also generate sparse random features when the original (input) feature vector is sparse and thus linear models using proposed methods . Experiments using real-world datasets demonstrated the effectiveness of the proposed methodology: linear models using the proposed random feature maps ran from 10 to 100 times faster than ones based on existing methods.

摘要

随机特征映射是一种很有前途的大规模核方法工具。由于大多数随机特征映射生成密集的随机特征,导致内存爆炸,因此很难将其应用于非常大规模的稀疏数据集。因子分解机和相关模型,它们有效地利用特征组合,能够很好地扩展到大规模稀疏数据集,并已在许多应用中得到应用。然而,它们的优化问题通常是非凸的。因此,尽管它们使用基于梯度的迭代方法进行优化,但这些方法通常无法找到全局最优解,并且需要大量的迭代才能收敛。在本文中,我们定义了项多重集核,它是项集核和点积核的推广。不幸的是,项集核和点积核的随机特征映射不能逼近项多重集核。因此,我们开发了一种将项多重集核转换为项集核的方法,使项多重集核能够通过使用项集核的随机特征映射来逼近。我们提出了两种用于项集核的随机特征映射,它们比现有的项集核特征映射运行速度更快,内存效率更高。当原始(输入)特征向量稀疏时,它们还会生成稀疏的随机特征,因此使用所提出的方法的线性模型。使用真实数据集的实验证明了所提出的方法的有效性:使用所提出的随机特征映射的线性模型的运行速度比基于现有方法的模型快 10 到 100 倍。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验