Suppr超能文献

高维时间数据的特征选择。

Feature selection for high-dimensional temporal data.

机构信息

Department of Computer Science, University of Crete, Voutes Campus, Heraklion, 70013, Greece.

出版信息

BMC Bioinformatics. 2018 Jan 23;19(1):17. doi: 10.1186/s12859-018-2023-7.

Abstract

BACKGROUND

Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional "omics" temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables.

RESULTS

The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics.

CONCLUSIONS

The use of this algorithm is suggested for variable selection with high-dimensional temporal data.

摘要

背景

特征选择常用于识别具有整体预测能力的生物标志物和生物特征;它有助于构建小的统计模型,这些模型更容易验证、可视化和理解,同时为人类专家提供深入的见解。在这项工作中,我们将已有的基于约束的特征选择方法扩展到高维“组学”时间数据,其中测量的数量比样本数量大几个数量级。扩展需要为时间和/或静态变量开发条件独立测试,这些变量取决于一组时间变量。

结果

该算法能够返回多个等效的变量子集,能够扩展到数以万计的特征,并且根据分析任务的具体情况,其性能优于或等同于现有方法。

结论

建议在具有高维时间数据的变量选择中使用此算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c334/5778658/8e1f067e5a6e/12859_2018_2023_Fig1_HTML.jpg

相似文献

1
Feature selection for high-dimensional temporal data.
BMC Bioinformatics. 2018 Jan 23;19(1):17. doi: 10.1186/s12859-018-2023-7.
2
Artificial Intelligence based wrapper for high dimensional feature selection.
BMC Bioinformatics. 2023 Oct 18;24(1):392. doi: 10.1186/s12859-023-05502-x.
3
The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data.
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1214-1224. doi: 10.1109/TCBB.2020.3029952. Epub 2022 Apr 1.
4
Sparse canonical correlation analysis with application to genomic data integration.
Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.
6
Feature selection with the R package .
F1000Res. 2018 Sep 20;7:1505. doi: 10.12688/f1000research.16216.2. eCollection 2018.
8
The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification.
Math Med Biol. 2007 Dec;24(4):413-26. doi: 10.1093/imammb/dqn001. Epub 2008 Feb 22.
9
Efficient ℓ -norm feature selection based on augmented and penalized minimization.
Stat Med. 2018 Feb 10;37(3):473-486. doi: 10.1002/sim.7526. Epub 2017 Oct 30.
10
Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.
Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.

引用本文的文献

2
Development and evaluation of a chronic kidney disease risk prediction model using random forest.
Front Genet. 2024 Jun 27;15:1409755. doi: 10.3389/fgene.2024.1409755. eCollection 2024.
3
Identification of key biomarkers for STAD using filter feature selection approaches.
Sci Rep. 2022 Nov 18;12(1):19854. doi: 10.1038/s41598-022-21760-w.
4
Advancing tools for human early lifecourse exposome research and translation (ATHLETE): Project overview.
Environ Epidemiol. 2021 Oct 1;5(5):e166. doi: 10.1097/EE9.0000000000000166. eCollection 2021 Oct.
5
Bayesian regularization for a nonstationary Gaussian linear mixed effects model.
Stat Med. 2022 Feb 20;41(4):681-697. doi: 10.1002/sim.9279. Epub 2021 Dec 12.
6
7
Scanning of Genetic Variants and Genetic Mapping of Phenotypic Traits in Gilthead Sea Bream Through ddRAD Sequencing.
Front Genet. 2019 Aug 6;10:675. doi: 10.3389/fgene.2019.00675. eCollection 2019.
8
Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control.
PLoS One. 2019 May 31;14(5):e0217463. doi: 10.1371/journal.pone.0217463. eCollection 2019.
9
Metaheuristic approaches in biopharmaceutical process development data analysis.
Bioprocess Biosyst Eng. 2019 Sep;42(9):1399-1408. doi: 10.1007/s00449-019-02147-0. Epub 2019 May 22.
10
A greedy feature selection algorithm for Big Data of high dimensionality.
Mach Learn. 2019;108(2):149-202. doi: 10.1007/s10994-018-5748-7. Epub 2018 Aug 7.

本文引用的文献

2
Structured feature selection using coordinate descent optimization.
BMC Bioinformatics. 2016 Apr 8;17:158. doi: 10.1186/s12859-016-0954-4.
3
A centroid-based gene selection method for microarray data classification.
J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.
5
Variable Selection in Generalized Functional Linear Models.
Stat. 2013;2(1):86-103. doi: 10.1002/sta4.20.
6
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
IEEE Trans Nanobioscience. 2012 Sep;11(3):216-27. doi: 10.1109/TNB.2012.2214232.
7
Model selection for generalized estimating equations accommodating dropout missingness.
Biometrics. 2012 Dec;68(4):1046-54. doi: 10.1111/j.1541-0420.2012.01758.x. Epub 2012 Mar 29.
9
Penalized generalized estimating equations for high-dimensional longitudinal data analysis.
Biometrics. 2012 Jun;68(2):353-60. doi: 10.1111/j.1541-0420.2011.01678.x. Epub 2011 Sep 28.
10
Discriminant analysis for repeated measures data: a review.
Front Psychol. 2010 Sep 9;1:146. doi: 10.3389/fpsyg.2010.00146. eCollection 2010.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验