Suppr超能文献

线性和非线性特征选择方法在大型调查数据集分析中的性能比较。

Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets.

机构信息

Digital Health Hub, Simon Fraser University, Surrey, British Columbia, Canada.

Science and Technology for Aging Research Institute, Simon Fraser University, Surrey, British Columbia, Canada.

出版信息

PLoS One. 2019 Mar 21;14(3):e0213584. doi: 10.1371/journal.pone.0213584. eCollection 2019.

Abstract

Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein data are linked by way of non-linear dependencies. This in turn requires applications of analytic methods, which are more accurate in identifying potentially non-linear dependencies. Here, we objectively compared the feature selection performance of several frequently-used linear selection methods and three non-linear selection methods in the context of large survey data. These methods were assessed using both synthetic and real-world datasets, wherein relationships between the features and dependent variables were known in advance. In contrast to linear methods, we found that the non-linear methods offered better overall feature selection performance than linear methods in all usage conditions. Moreover, the performance of the non-linear methods was more stable, being unaffected by the inclusion or exclusion of variables from the datasets. These properties make non-linear feature selection methods a potentially preferable tool for both hypothesis-driven and exploratory analyses for aging-related datasets.

摘要

大型与衰老相关的调查数据库通常被用来发现影响感兴趣的因变量的关键因素。通常,这种分析是使用假设变量之间存在线性关系的方法进行的。然而,在许多情况下,数据是通过非线性关系联系在一起的。这反过来又需要应用分析方法,这些方法在识别潜在的非线性依赖关系方面更为准确。在这里,我们在大型调查数据的背景下,客观地比较了几种常用的线性选择方法和三种非线性选择方法的特征选择性能。这些方法使用合成数据集和真实数据集进行了评估,其中特征和因变量之间的关系是预先已知的。与线性方法相比,我们发现,在所有使用条件下,非线性方法的总体特征选择性能都优于线性方法。此外,非线性方法的性能更稳定,不受数据集内变量的包含或排除的影响。这些特性使得非线性特征选择方法成为与衰老相关的数据集的假设驱动和探索性分析的潜在首选工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/600c/6428288/4551555d3c2f/pone.0213584.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验