Suppr超能文献

HDSI:具有交互作用的高维选择算法在特征选择和检验中的应用。

HDSI: High dimensional selection with interactions algorithm on feature selection and testing.

机构信息

Biostatistics Department, Princess Margaret Cancer Research Centre, Toronto, Ontario, Canada.

Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.

出版信息

PLoS One. 2021 Feb 16;16(2):e0246159. doi: 10.1371/journal.pone.0246159. eCollection 2021.

Abstract

Feature selection on high dimensional data along with the interaction effects is a critical challenge for classical statistical learning techniques. Existing feature selection algorithms such as random LASSO leverages LASSO capability to handle high dimensional data. However, the technique has two main limitations, namely the inability to consider interaction terms and the lack of a statistical test for determining the significance of selected features. This study proposes a High Dimensional Selection with Interactions (HDSI) algorithm, a new feature selection method, which can handle high-dimensional data, incorporate interaction terms, provide the statistical inferences of selected features and leverage the capability of existing classical statistical techniques. The method allows the application of any statistical technique like LASSO and subset selection on multiple bootstrapped samples; each contains randomly selected features. Each bootstrap data incorporates interaction terms for the randomly sampled features. The selected features from each model are pooled and their statistical significance is determined. The selected statistically significant features are used as the final output of the approach, whose final coefficients are estimated using appropriate statistical techniques. The performance of HDSI is evaluated using both simulated data and real studies. In general, HDSI outperforms the commonly used algorithms such as LASSO, subset selection, adaptive LASSO, random LASSO and group LASSO.

摘要

在高维数据上进行特征选择以及交互作用是经典统计学习技术的一个关键挑战。现有的特征选择算法,如随机 LASSO,利用 LASSO 能力来处理高维数据。然而,该技术有两个主要的局限性,即无法考虑交互项,也缺乏用于确定所选特征重要性的统计检验。本研究提出了一种新的特征选择方法——高维交互选择(HDSI)算法,它可以处理高维数据,纳入交互项,为所选特征提供统计推断,并利用现有经典统计技术的能力。该方法允许在多个自举样本上应用任何统计技术,如 LASSO 和子集选择;每个样本都包含随机选择的特征。每个自举数据都包含随机采样特征的交互项。从每个模型中选择的特征被汇集在一起,并确定其统计显著性。选择具有统计学意义的特征作为该方法的最终输出,其最终系数使用适当的统计技术进行估计。HDSI 的性能通过模拟数据和真实研究进行评估。总的来说,HDSI 优于常用的算法,如 LASSO、子集选择、自适应 LASSO、随机 LASSO 和组 LASSO。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b430/7886179/d813d042cff1/pone.0246159.g001.jpg

相似文献

1
HDSI: High dimensional selection with interactions algorithm on feature selection and testing.
PLoS One. 2021 Feb 16;16(2):e0246159. doi: 10.1371/journal.pone.0246159. eCollection 2021.
2
Artificial Intelligence based wrapper for high dimensional feature selection.
BMC Bioinformatics. 2023 Oct 18;24(1):392. doi: 10.1186/s12859-023-05502-x.
3
High-dimensional feature selection by feature-wise kernelized Lasso.
Neural Comput. 2014 Jan;26(1):185-207. doi: 10.1162/NECO_a_00537. Epub 2013 Oct 8.
4
Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.
Biomed Res Int. 2015;2015:703768. doi: 10.1155/2015/703768. Epub 2015 Feb 24.
5
Interaction-Based Feature Selection for Uncovering Cancer Driver Genes Through Copy Number-Driven Expression Level.
J Comput Biol. 2017 Feb;24(2):138-152. doi: 10.1089/cmb.2016.0140. Epub 2016 Oct 19.
6
The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data.
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1214-1224. doi: 10.1109/TCBB.2020.3029952. Epub 2022 Apr 1.
7
CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data.
J King Saud Univ Comput Inf Sci. 2023 Oct;35(9):101731. doi: 10.1016/j.jksuci.2023.101731.
8
Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.
J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.
9
Recursive Random Forests Enable Better Predictive Performance and Model Interpretation than Variable Selection by LASSO.
J Chem Inf Model. 2015 Apr 27;55(4):736-46. doi: 10.1021/ci500715e. Epub 2015 Mar 16.

本文引用的文献

1
Iterative random forests to discover predictive and stable high-order interactions.
Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1943-1948. doi: 10.1073/pnas.1711236115. Epub 2018 Jan 19.
2
Variable selection - A review and recommendations for the practicing statistician.
Biom J. 2018 May;60(3):431-449. doi: 10.1002/bimj.201700067. Epub 2018 Jan 2.
3
A non-linear data mining parameter selection algorithm for continuous variables.
PLoS One. 2017 Nov 13;12(11):e0187676. doi: 10.1371/journal.pone.0187676. eCollection 2017.
4
Five myths about variable selection.
Transpl Int. 2017 Jan;30(1):6-10. doi: 10.1111/tri.12895.
5
Univariate Screening Measures for Cluster Analysis.
Multivariate Behav Res. 1995 Jul 1;30(3):385-427. doi: 10.1207/s15327906mbr3003_5.
6
Learning interactions via hierarchical group-lasso regularization.
J Comput Graph Stat. 2015;24(3):627-654. doi: 10.1080/10618600.2014.938812. Epub 2015 Sep 16.
8
A LASSO FOR HIERARCHICAL INTERACTIONS.
Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.
10
RANDOM LASSO.
Ann Appl Stat. 2011 Mar 1;5(1):468-485. doi: 10.1214/10-AOAS377.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验