• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Spectral methods in machine learning and new strategies for very large datasets.机器学习中的谱方法及针对超大型数据集的新策略。
Proc Natl Acad Sci U S A. 2009 Jan 13;106(2):369-74. doi: 10.1073/pnas.0810600105. Epub 2009 Jan 7.
2
Clustered Nyström method for large scale manifold learning and dimension reduction.用于大规模流形学习和降维的聚类奈斯特罗姆方法。
IEEE Trans Neural Netw. 2010 Oct;21(10):1576-87. doi: 10.1109/TNN.2010.2064786. Epub 2010 Aug 30.
3
Density-weighted Nyström method for computing large kernel eigensystems.用于计算大型核特征系统的密度加权奈斯特罗姆方法。
Neural Comput. 2009 Jan;21(1):121-46. doi: 10.1162/neco.2008.11-07-651.
4
A fast algorithm for learning a ranking function from large-scale data sets.一种从大规模数据集中学习排序函数的快速算法。
IEEE Trans Pattern Anal Mach Intell. 2008 Jul;30(7):1158-70. doi: 10.1109/TPAMI.2007.70776.
5
On landmark selection and sampling in high-dimensional data analysis.关于高维数据分析中的地标选择和采样。
Philos Trans A Math Phys Eng Sci. 2009 Nov 13;367(1906):4295-312. doi: 10.1098/rsta.2009.0161.
6
Kernel entropy component analysis.核熵分量分析。
IEEE Trans Pattern Anal Mach Intell. 2010 May;32(5):847-60. doi: 10.1109/TPAMI.2009.100.
7
Data classification with radial basis function networks based on a novel kernel density estimation algorithm.基于一种新型核密度估计算法的径向基函数网络数据分类
IEEE Trans Neural Netw. 2005 Jan;16(1):225-36. doi: 10.1109/TNN.2004.836229.
8
Reduced support vector machines: a statistical theory.简约支持向量机:一种统计理论。
IEEE Trans Neural Netw. 2007 Jan;18(1):1-13. doi: 10.1109/TNN.2006.883722.
9
Randomized algorithms for large-scale dictionary learning.大规模字典学习的随机算法。
Neural Netw. 2024 Nov;179:106628. doi: 10.1016/j.neunet.2024.106628. Epub 2024 Aug 10.
10
Multi-Nyström Method Based on Multiple Kernel Learning for Large Scale Imbalanced Classification.基于多核学习的多奈斯特罗姆方法用于大规模不平衡分类
Comput Intell Neurosci. 2021 Jun 13;2021:9911871. doi: 10.1155/2021/9911871. eCollection 2021.

引用本文的文献

1
Emotional Variance Analysis: A new sentiment analysis feature set for Artificial Intelligence and Machine Learning applications.情感波动分析:人工智能和机器学习应用的新情感分析特征集。
PLoS One. 2023 Jan 12;18(1):e0274299. doi: 10.1371/journal.pone.0274299. eCollection 2023.
2
A Machine Learning Approach for Early Diagnosis of Cognitive Impairment Using Population-Based Data.基于人群数据的机器学习方法用于认知障碍的早期诊断。
J Alzheimers Dis. 2023;91(1):449-461. doi: 10.3233/JAD-220776.
3
Spectral clustering using Nyström approximation for the accurate identification of cancer molecular subtypes.基于 Nyström 逼近的谱聚类用于准确识别癌症分子亚型。
Sci Rep. 2017 Jul 7;7(1):4896. doi: 10.1038/s41598-017-05275-3.
4
Sampling from Determinantal Point Processes for Scalable Manifold Learning.从行列式点过程进行采样以实现可扩展流形学习
Inf Process Med Imaging. 2015;24:687-98. doi: 10.1007/978-3-319-19992-4_54.
5
Making sense of big data.理解大数据。
Proc Natl Acad Sci U S A. 2013 Nov 5;110(45):18031-2. doi: 10.1073/pnas.1317797110. Epub 2013 Oct 21.
6
On landmark selection and sampling in high-dimensional data analysis.关于高维数据分析中的地标选择和采样。
Philos Trans A Math Phys Eng Sci. 2009 Nov 13;367(1906):4295-312. doi: 10.1098/rsta.2009.0161.

本文引用的文献

1
Randomized algorithms for the low-rank approximation of matrices.矩阵低秩逼近的随机算法。
Proc Natl Acad Sci U S A. 2007 Dec 18;104(51):20167-72. doi: 10.1073/pnas.0709640104. Epub 2007 Dec 4.
2
Hessian eigenmaps: locally linear embedding techniques for high-dimensional data.黑森特征映射:用于高维数据的局部线性嵌入技术。
Proc Natl Acad Sci U S A. 2003 May 13;100(10):5591-6. doi: 10.1073/pnas.1031596100. Epub 2003 Apr 30.
3
Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps.几何扩散作为调和分析和数据结构定义的工具:扩散映射
Proc Natl Acad Sci U S A. 2005 May 24;102(21):7426-31. doi: 10.1073/pnas.0500334102. Epub 2005 May 17.
4
Spectral grouping using the Nyström method.使用Nyström方法进行谱分组。
IEEE Trans Pattern Anal Mach Intell. 2004 Feb;26(2):214-25. doi: 10.1109/TPAMI.2004.1262185.
5
A global geometric framework for nonlinear dimensionality reduction.一种用于非线性降维的全局几何框架。
Science. 2000 Dec 22;290(5500):2319-23. doi: 10.1126/science.290.5500.2319.

机器学习中的谱方法及针对超大型数据集的新策略。

Spectral methods in machine learning and new strategies for very large datasets.

作者信息

Belabbas Mohamed-Ali, Wolfe Patrick J

机构信息

Department of Statistics, School of Engineering and Applied Sciences, Oxford Street, Harvard University, Cambridge, MA 02138, USA.

出版信息

Proc Natl Acad Sci U S A. 2009 Jan 13;106(2):369-74. doi: 10.1073/pnas.0810600105. Epub 2009 Jan 7.

DOI:10.1073/pnas.0810600105
PMID:19129490
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2626709/
Abstract

Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these-based on sampling-leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach-based on sorting-provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods.

摘要

谱方法在统计学和机器学习中具有至关重要的地位,因为它们是从经典主成分分析到利用流形结构的最新方法等各类算法的基础。在大多数情况下,核心技术问题可归结为计算正定核的低秩近似。然而,对于处理超大规模或高维数据集的应用来说,精确谱分解所提供的最优近似成本过高,这是因为其复杂度与训练样本数量或其维度的立方成正比。受此类应用的推动,我们在此提出两种用于正定核近似的新算法,以及比文献结果更优的误差界。我们通过以高效方式确定相对于手头核近似任务而言数据中最具信息性的子集来解决此问题。这产生了基于Nyström方法的两种新策略,它们可直接应用于海量数据集。其中第一种基于采样,得到一种随机算法,在此算法中核在其划分集上诱导出一种概率分布,而后者基于排序,以确定性方式提供划分的选择。我们详细阐述它们的数值实现,并针对统计数据分析中的各种代表性问题给出模拟结果,每个结果都表明我们的方法相对于现有方法具有更好的性能。