• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索无监督特征提取算法:解决小数据集中的高维问题。

Exploring unsupervised feature extraction algorithms: tackling high dimensionality in small datasets.

作者信息

Niu Hongqi, McCallum Gabrielle B, Chang Anne B, Khan Khalid, Azam Sami

机构信息

Faculty of Science and Technology, Charles Darwin University, Darwin, Northern Territory, 0909, Australia.

Child and Maternal Health Division and NHMRC Centre for Research Excellence in Paediatric Bronchiectasis (AusBREATHE), Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, 0810, Australia.

出版信息

Sci Rep. 2025 Jul 1;15(1):21973. doi: 10.1038/s41598-025-07725-9.

DOI:10.1038/s41598-025-07725-9
PMID:40595281
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12216002/
Abstract

Small datasets are common in many fields due to factors such as limited data collection opportunities or privacy concerns. These datasets often contain high-dimensional features, yet present significant challenges of dimensionality, wherein the sparsity of data in high-dimensional spaces makes it difficult to extract meaningful information and less accurate predictive models are produced. In this regard, feature extraction algorithms are important in addressing these challenges by reducing dimensionality while retaining essential information. These algorithms can be classified into supervised, unsupervised, and semi-supervised methods and categorized as linear or nonlinear. To overview this critical issue, this review focuses on unsupervised feature extraction algorithms (UFEAs) due to their ability to handle high-dimensional data without relying on labelled information. From this review, eight representative UFEAs were selected: principal component analysis, classical multidimensional scaling, Kernel PCA, isometric mapping, locally linear embedding, Laplacian Eigenmaps, independent component analysis and Autoencoders. The theoretical background of these algorithms has been presented, discussing their conceptual viewpoints, such as whether they are linear or nonlinear, manifold-based, probabilistic density function-based, or neural network-based. After classifying these algorithms using these taxonomies, we thoroughly and systematically reviewed each algorithm from the perspective of their working mechanisms, providing a detailed algorithmic explanation for each UFEA. We also explored how these mechanisms contribute to an effective reduction in dimensionality, particularly in small datasets with high dimensionality. Furthermore, we compared these algorithms in terms of transformation approach, goals, parameters, and computational complexity. Finally, we evaluated each algorithm against state-of-the-art research using various datasets to assess their accuracy, highlighting which algorithm is most appropriate for specific scenarios. Overall, this review provides insights into the strengths and weaknesses of various UFEAs, offering guidance on selecting appropriate algorithms for small high-dimensional datasets.

摘要

由于数据收集机会有限或隐私问题等因素,小数据集在许多领域都很常见。这些数据集通常包含高维特征,但也带来了显著的维度挑战,即高维空间中数据的稀疏性使得提取有意义的信息变得困难,并且产生的预测模型准确性较低。在这方面,特征提取算法对于通过降维同时保留基本信息来应对这些挑战很重要。这些算法可分为监督、无监督和半监督方法,并可分为线性或非线性。为了概述这个关键问题,本综述重点关注无监督特征提取算法(UFEA),因为它们能够在不依赖标记信息的情况下处理高维数据。通过本综述,选择了八种具有代表性的UFEA:主成分分析、经典多维缩放、核主成分分析、等距映射、局部线性嵌入、拉普拉斯特征映射、独立成分分析和自动编码器。介绍了这些算法的理论背景,讨论了它们的概念观点,例如它们是线性还是非线性、基于流形、基于概率密度函数还是基于神经网络。使用这些分类法对这些算法进行分类后,我们从工作机制的角度对每种算法进行了全面而系统的综述,为每种UFEA提供了详细的算法解释。我们还探讨了这些机制如何有助于有效降维,特别是在具有高维度的小数据集中。此外,我们在变换方法、目标、参数和计算复杂度方面对这些算法进行了比较。最后,我们使用各种数据集针对最新研究评估了每种算法,以评估它们的准确性,突出了哪种算法最适合特定场景。总体而言,本综述深入了解了各种UFEA的优缺点,为为小型高维数据集选择合适的算法提供了指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/6ccfd691cca0/41598_2025_7725_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/f71afc135b20/41598_2025_7725_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/2f7941f65928/41598_2025_7725_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/feb0a0b1c698/41598_2025_7725_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/1990975f491f/41598_2025_7725_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/bf8544687cfb/41598_2025_7725_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/d6be3ac81ea3/41598_2025_7725_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/caaf72d8c157/41598_2025_7725_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/90430eee84d0/41598_2025_7725_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/a6464ca3f4ad/41598_2025_7725_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/d3047ee20907/41598_2025_7725_Fige_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/f12fb96c940b/41598_2025_7725_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/f48faff7ba9f/41598_2025_7725_Figf_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/502677259bde/41598_2025_7725_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/4157f987c269/41598_2025_7725_Figg_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/1c2fe0d6d7cc/41598_2025_7725_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/cad0ea708466/41598_2025_7725_Figh_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/eba58279aca6/41598_2025_7725_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/6ccfd691cca0/41598_2025_7725_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/f71afc135b20/41598_2025_7725_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/2f7941f65928/41598_2025_7725_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/feb0a0b1c698/41598_2025_7725_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/1990975f491f/41598_2025_7725_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/bf8544687cfb/41598_2025_7725_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/d6be3ac81ea3/41598_2025_7725_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/caaf72d8c157/41598_2025_7725_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/90430eee84d0/41598_2025_7725_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/a6464ca3f4ad/41598_2025_7725_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/d3047ee20907/41598_2025_7725_Fige_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/f12fb96c940b/41598_2025_7725_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/f48faff7ba9f/41598_2025_7725_Figf_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/502677259bde/41598_2025_7725_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/4157f987c269/41598_2025_7725_Figg_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/1c2fe0d6d7cc/41598_2025_7725_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/cad0ea708466/41598_2025_7725_Figh_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/eba58279aca6/41598_2025_7725_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/12216002/6ccfd691cca0/41598_2025_7725_Fig10_HTML.jpg

相似文献

1
Exploring unsupervised feature extraction algorithms: tackling high dimensionality in small datasets.探索无监督特征提取算法:解决小数据集中的高维问题。
Sci Rep. 2025 Jul 1;15(1):21973. doi: 10.1038/s41598-025-07725-9.
2
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
3
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
4
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.
7
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
8
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
9
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
10
The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂(GLP-1 RAs)减肥效果的网状Meta分析的数量、质量及结果:一项范围综述
Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

本文引用的文献

1
Dimensionality Reduction in Surrogate Modeling: A Review of Combined Methods.代理建模中的降维:组合方法综述
Data Sci Eng. 2022;7(4):402-427. doi: 10.1007/s41019-022-00193-5. Epub 2022 Aug 21.
2
State of the Art of Machine Learning-Enabled Clinical Decision Support in Intensive Care Units: Literature Review.重症监护病房中基于机器学习的临床决策支持技术现状:文献综述
JMIR Med Inform. 2022 Mar 3;10(3):e28781. doi: 10.2196/28781.
3
Prediction of hot spots in protein-DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting.
基于有监督等距特征映射和极端梯度提升的蛋白质-DNA 结合界面热点预测。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):381. doi: 10.1186/s12859-020-03683-3.
4
MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease.MultiPLIER:一种转录组学的迁移学习框架,揭示了罕见病的系统特征。
Cell Syst. 2019 May 22;8(5):380-394.e4. doi: 10.1016/j.cels.2019.04.003.
5
Autoencoder Based Feature Selection Method for Classification of Anticancer Drug Response.基于自动编码器的抗癌药物反应分类特征选择方法
Front Genet. 2019 Mar 27;10:233. doi: 10.3389/fgene.2019.00233. eCollection 2019.
6
Multi-Dimensional Scaling based grouping of known complexes and intelligent protein complex detection.基于多维尺度分析的已知复合物分组和智能蛋白质复合物检测。
Comput Biol Chem. 2018 Jun;74:149-156. doi: 10.1016/j.compbiolchem.2018.03.023. Epub 2018 Mar 22.
7
Principal component analysis: a review and recent developments.主成分分析:综述与最新进展
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.
8
A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.应用于微阵列数据的特征选择与特征提取方法综述
Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.
9
Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology.基于文本预测因子的监督式嵌入及其在儿科心脏病学临床诊断中的应用。
J Am Med Inform Assoc. 2014 Feb;21(e1):e136-42. doi: 10.1136/amiajnl-2013-001792. Epub 2013 Sep 27.
10
A Review on Dimension Reduction.关于降维的综述
Int Stat Rev. 2013 Apr;81(1):134-150. doi: 10.1111/j.1751-5823.2012.00182.x.