• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

线性和非线性特征选择方法在大型调查数据集分析中的性能比较。

Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets.

机构信息

Digital Health Hub, Simon Fraser University, Surrey, British Columbia, Canada.

Science and Technology for Aging Research Institute, Simon Fraser University, Surrey, British Columbia, Canada.

出版信息

PLoS One. 2019 Mar 21;14(3):e0213584. doi: 10.1371/journal.pone.0213584. eCollection 2019.

DOI:10.1371/journal.pone.0213584
PMID:30897097
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6428288/
Abstract

Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein data are linked by way of non-linear dependencies. This in turn requires applications of analytic methods, which are more accurate in identifying potentially non-linear dependencies. Here, we objectively compared the feature selection performance of several frequently-used linear selection methods and three non-linear selection methods in the context of large survey data. These methods were assessed using both synthetic and real-world datasets, wherein relationships between the features and dependent variables were known in advance. In contrast to linear methods, we found that the non-linear methods offered better overall feature selection performance than linear methods in all usage conditions. Moreover, the performance of the non-linear methods was more stable, being unaffected by the inclusion or exclusion of variables from the datasets. These properties make non-linear feature selection methods a potentially preferable tool for both hypothesis-driven and exploratory analyses for aging-related datasets.

摘要

大型与衰老相关的调查数据库通常被用来发现影响感兴趣的因变量的关键因素。通常,这种分析是使用假设变量之间存在线性关系的方法进行的。然而,在许多情况下,数据是通过非线性关系联系在一起的。这反过来又需要应用分析方法,这些方法在识别潜在的非线性依赖关系方面更为准确。在这里,我们在大型调查数据的背景下,客观地比较了几种常用的线性选择方法和三种非线性选择方法的特征选择性能。这些方法使用合成数据集和真实数据集进行了评估,其中特征和因变量之间的关系是预先已知的。与线性方法相比,我们发现,在所有使用条件下,非线性方法的总体特征选择性能都优于线性方法。此外,非线性方法的性能更稳定,不受数据集内变量的包含或排除的影响。这些特性使得非线性特征选择方法成为与衰老相关的数据集的假设驱动和探索性分析的潜在首选工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/600c/6428288/60ae716ab40d/pone.0213584.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/600c/6428288/4551555d3c2f/pone.0213584.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/600c/6428288/60ae716ab40d/pone.0213584.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/600c/6428288/4551555d3c2f/pone.0213584.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/600c/6428288/60ae716ab40d/pone.0213584.g002.jpg

相似文献

1
Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets.线性和非线性特征选择方法在大型调查数据集分析中的性能比较。
PLoS One. 2019 Mar 21;14(3):e0213584. doi: 10.1371/journal.pone.0213584. eCollection 2019.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法,这些算法可从癌症的多组学数据中得到顶级特征和基因特征。
BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.
4
An experimental comparison of feature selection methods on two-class biomedical datasets.两类生物医学数据集上特征选择方法的实验比较。
Comput Biol Med. 2015 Nov 1;66:1-10. doi: 10.1016/j.compbiomed.2015.08.010. Epub 2015 Aug 24.
5
Evaluating the impact of multivariate imputation by MICE in feature selection.评估 MICE 进行多元插补对特征选择的影响。
PLoS One. 2021 Jul 28;16(7):e0254720. doi: 10.1371/journal.pone.0254720. eCollection 2021.
6
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。
BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.
7
Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography.改进用于特征选择的曼-惠特尼统计检验:一种乳腺钼靶摄影乳腺癌诊断方法
Artif Intell Med. 2015 Jan;63(1):19-31. doi: 10.1016/j.artmed.2014.12.004. Epub 2014 Dec 12.
8
Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.用于临床预测的稳定特征选择:利用树套索法挖掘国际疾病分类树结构
J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.
9
Learning mixed graphical models with separate sparsity parameters and stability-based model selection.学习具有单独稀疏参数和基于稳定性的模型选择的混合图形模型。
BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):175. doi: 10.1186/s12859-016-1039-0.
10
SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.SVM-RFE:通过非线性核选择和可视化最相关特征。
BMC Bioinformatics. 2018 Nov 19;19(1):432. doi: 10.1186/s12859-018-2451-4.

引用本文的文献

1
Advanced machine learning framework for thyroid cancer epidemiology in Iran through integration of environmental socioeconomic and health system predictors.通过整合环境、社会经济和卫生系统预测因素建立的用于伊朗甲状腺癌流行病学研究的先进机器学习框架。
Sci Rep. 2025 Aug 14;15(1):29901. doi: 10.1038/s41598-025-15324-x.
2
Interpretable machine learning for precision cognitive aging.用于精准认知衰老的可解释机器学习
Front Comput Neurosci. 2025 May 16;19:1560064. doi: 10.3389/fncom.2025.1560064. eCollection 2025.
3
Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits.

本文引用的文献

1
Regression assumptions in clinical psychology research practice-a systematic review of common misconceptions.临床心理学研究实践中的回归假设——常见误解的系统综述
PeerJ. 2017 May 16;5:e3323. doi: 10.7717/peerj.3323. eCollection 2017.
2
Cohort profile: Wisconsin longitudinal study (WLS).队列简介:威斯康星纵向研究(WLS)。
Int J Epidemiol. 2014 Feb;43(1):34-41. doi: 10.1093/ije/dys194.
3
Widowhood, age heterogamy, and health: the role of selection, marital quality, and health behaviors.鳏寡、年龄异配和健康:选择、婚姻质量和健康行为的作用。
基于机器学习方法的拟南芥部分性状全基因组预测比较。
PLoS One. 2024 Aug 28;19(8):e0308962. doi: 10.1371/journal.pone.0308962. eCollection 2024.
4
A Subtype Perspective on Cognitive Trajectories in Healthy Aging.健康老龄化认知轨迹的亚型视角
Brain Sci. 2024 Apr 1;14(4):351. doi: 10.3390/brainsci14040351.
5
Autoencoder Composite Scoring to Evaluate Prosthetic Performance in Individuals with Lower Limb Amputation.用于评估下肢截肢患者假肢性能的自动编码器综合评分法。
Bioengineering (Basel). 2022 Oct 18;9(10):572. doi: 10.3390/bioengineering9100572.
6
Cuffless Blood Pressure Measurement Using Linear and Nonlinear Optimized Feature Selection.使用线性和非线性优化特征选择的无袖带血压测量
Diagnostics (Basel). 2022 Feb 5;12(2):408. doi: 10.3390/diagnostics12020408.
7
Robotic Kinematic measures of the arm in chronic Stroke: part 2 - strong correlation with clinical outcome measures.慢性卒中患者手臂的机器人运动学测量:第2部分——与临床结局指标的强相关性
Bioelectron Med. 2021 Dec 29;7(1):21. doi: 10.1186/s42234-021-00082-8.
8
Healthy memory aging - the benefits of regular daily activities increase with age.健康的记忆老化——有规律的日常活动的益处随着年龄的增长而增加。
Aging (Albany NY). 2021 Dec 16;13(24):25643-25652. doi: 10.18632/aging.203753.
9
A roadmap for multi-omics data integration using deep learning.利用深度学习进行多组学数据整合的路线图。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab454.
10
A Classification Approach for Cancer Survivors from Those Cancer-Free, Based on Health Behaviors: Analysis of the Lifelines Cohort.基于健康行为的癌症幸存者与非癌症人群分类方法:生命线队列分析
Cancers (Basel). 2021 May 12;13(10):2335. doi: 10.3390/cancers13102335.
J Gerontol B Psychol Sci Soc Sci. 2014 Jan;69(1):123-34. doi: 10.1093/geronb/gbt104. Epub 2013 Oct 15.
4
Weight status in adolescence is associated with later life functional limitations.青少年时期的体重状况与以后的生活功能障碍有关。
J Aging Health. 2013 Aug;25(5):758-75. doi: 10.1177/0898264313491426. Epub 2013 Jun 10.
5
Benefits of educational attainment on adult fluid cognition: international evidence from three birth cohorts.教育程度对成人流畅认知能力的益处:来自三个出生队列的国际证据。
Int J Epidemiol. 2012 Dec;41(6):1729-36. doi: 10.1093/ije/dys148. Epub 2012 Oct 28.
6
Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?生命科学中的随机森林数据挖掘:是漫步公园还是迷失丛林?
Brief Bioinform. 2013 May;14(3):315-26. doi: 10.1093/bib/bbs034. Epub 2012 Jul 10.
7
On Brownian Distance Covariance and High Dimensional Data.关于布朗距离协方差与高维数据
Ann Appl Stat. 2009 Jan 1;3(4):1266-1269. doi: 10.1214/09-AOAS312.
8
Cigarette smoking: health effects and control strategies.吸烟:对健康的影响及控制策略。
Drugs Today (Barc). 2008 Dec;44(12):895-904. doi: 10.1358/dot.2008.44.12.1308898.
9
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.
10
Sensitivity to reward and body mass index (BMI): evidence for a non-linear relationship.对奖励的敏感性与体重指数(BMI):非线性关系的证据。
Appetite. 2008 Jan;50(1):43-9. doi: 10.1016/j.appet.2007.05.007. Epub 2007 May 29.