RIFS：一种随机重启的增量特征选择算法。

RIFS: a randomly restarted incremental feature selection algorithm.

机构信息

College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.

出版信息

Sci Rep. 2017 Oct 12;7(1):13013. doi: 10.1038/s41598-017-13259-6.

DOI:10.1038/s41598-017-13259-6

PMID:29026108

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5638869/

Abstract

The advent of big data era has imposed both running time and learning efficiency challenges for the machine learning researchers. Biomedical OMIC research is one of these big data areas and has changed the biomedical research drastically. But the high cost of data production and difficulty in participant recruitment introduce the paradigm of "large p small n" into the biomedical research. Feature selection is usually employed to reduce the high number of biomedical features, so that a stable data-independent classification or regression model may be achieved. This study randomly changes the first element of the widely-used incremental feature selection (IFS) strategy and selects the best feature subset that may be ranked low by the statistical association evaluation algorithms, e.g. t-test. The hypothesis is that two low-ranked features may be orchestrated to achieve a good classification performance. The proposed Randomly re-started Incremental Feature Selection (RIFS) algorithm demonstrates both higher classification accuracy and smaller feature number than the existing algorithms. RIFS also outperforms the existing methylomic diagnosis model for the prostate malignancy with a larger accuracy and a lower number of transcriptomic features.

摘要

大数据时代的到来给机器学习研究人员带来了运行时间和学习效率方面的挑战。生物医学 OMIC 研究是这些大数据领域之一，它彻底改变了生物医学研究。但是，数据产生的高成本和参与者招募的困难将“大 p 小 n”范式引入了生物医学研究。特征选择通常用于减少大量的生物医学特征，从而可以实现稳定的数据独立分类或回归模型。本研究随机改变了增量特征选择（IFS）策略中广泛使用的第一个元素，并选择了可能被统计关联评估算法（例如 t 检验）排名较低的最佳特征子集。其假设是两个排名较低的特征可能会协调以实现良好的分类性能。与现有的算法相比，所提出的随机重新启动增量特征选择（RIFS）算法不仅具有更高的分类准确性，而且特征数量也更少。RIFS 还在转录组特征数量更少的情况下，在前列腺恶性肿瘤的甲基组学诊断模型方面的表现优于现有模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5dc/5638869/415ee9bc1cf7/41598_2017_13259_Fig1_HTML.jpg

相似文献

RIFS: a randomly restarted incremental feature selection algorithm.

Sci Rep. 2017 Oct 12;7(1):13013. doi: 10.1038/s41598-017-13259-6.

RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers.

Comput Biol Med. 2021 Jun;133:104405. doi: 10.1016/j.compbiomed.2021.104405. Epub 2021 Apr 17.

An OMIC biomarker detection algorithm TriVote and its application in methylomic biomarker detection.

Epigenomics. 2018 Apr;10(4):335-347. doi: 10.2217/epi-2017-0097. Epub 2018 Jan 19.

Zoo: Selecting Transcriptomic and Methylomic Biomarkers by Ensembling Animal-Inspired Swarm Intelligence Feature Selection Algorithms.

Genes (Basel). 2021 Nov 18;12(11):1814. doi: 10.3390/genes12111814.

McTwo: a two-step feature selection algorithm based on maximal information coefficient.

BMC Bioinformatics. 2016 Mar 23;17:142. doi: 10.1186/s12859-016-0990-0.

An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset.

ScientificWorldJournal. 2015;2015:821798. doi: 10.1155/2015/821798. Epub 2015 Sep 28.

BioDog, biomarker detection for improving identification power of breast cancer histologic grade in methylomics.

Epigenomics. 2019 Nov 1;11(15):1717-1732. doi: 10.2217/epi-2019-0230. Epub 2019 Oct 18.

A novel feature selection approach for biomedical data classification.

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

Rough sets and Laplacian score based cost-sensitive feature selection.

PLoS One. 2018 Jun 18;13(6):e0197564. doi: 10.1371/journal.pone.0197564. eCollection 2018.

Feature selection and nearest centroid classification for protein mass spectrometry.

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

引用本文的文献

In Silico Analysis Uncovers FOXA1 as a Potential Biomarker for Predicting Neoadjuvant Chemotherapy Response in Fine-Needle Aspiration Biopsies.

J Cancer. 2024 Sep 30;15(18):6052-6072. doi: 10.7150/jca.101901. eCollection 2024.

Identification of biomarkers for hepatocellular carcinoma based on single cell sequencing and machine learning algorithms.

Front Genet. 2022 Oct 24;13:873218. doi: 10.3389/fgene.2022.873218. eCollection 2022.

MuscNet, a Weighted Voting Model of Multi-Source Connectivity Networks to Predict Mild Cognitive Impairment Using Resting-State Functional MRI.

IEEE Access. 2020;8:174023-174031. doi: 10.1109/access.2020.3025828. Epub 2020 Sep 22.

Zoo: Selecting Transcriptomic and Methylomic Biomarkers by Ensembling Animal-Inspired Swarm Intelligence Feature Selection Algorithms.

Genes (Basel). 2021 Nov 18;12(11):1814. doi: 10.3390/genes12111814.

The Unsupervised Feature Selection Algorithms Based on Standard Deviation and Cosine Similarity for Genomic Data Analysis.

Front Genet. 2021 May 13;12:684100. doi: 10.3389/fgene.2021.684100. eCollection 2021.

EnRank: An Ensemble Method to Detect Pulmonary Hypertension Biomarkers Based on Feature Selection and Machine Learning Models.

Front Genet. 2021 Apr 27;12:636429. doi: 10.3389/fgene.2021.636429. eCollection 2021.

Detection and Comparative Analysis of Methylomic Biomarkers of Rheumatoid Arthritis.

Front Genet. 2020 Mar 27;11:238. doi: 10.3389/fgene.2020.00238. eCollection 2020.

AgeGuess, a Methylomic Prediction Model for Human Ages.

Front Bioeng Biotechnol. 2020 Mar 10;8:80. doi: 10.3389/fbioe.2020.00080. eCollection 2020.

Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers.

Front Genet. 2019 Mar 26;10:212. doi: 10.3389/fgene.2019.00212. eCollection 2019.

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances.

J Vis Exp. 2018 Oct 11(140):57738. doi: 10.3791/57738.

本文引用的文献

A methylome-wide mQTL analysis reveals associations of methylation sites with GAD1 and HDAC3 SNPs and a general psychiatric risk score.

Transl Psychiatry. 2017 Jan 17;7(1):e1002. doi: 10.1038/tp.2016.275.

Methylome-wide Association Study of Atrial Fibrillation in Framingham Heart Study.

Sci Rep. 2017 Jan 9;7:40377. doi: 10.1038/srep40377.

Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.

BMC Bioinformatics. 2017 Jan 3;18(1):9. doi: 10.1186/s12859-016-1423-9.

Genome-wide, high-content siRNA screening identifies the Alzheimer's genetic risk factor FERMT2 as a major modulator of APP metabolism.

Acta Neuropathol. 2017 Jun;133(6):955-966. doi: 10.1007/s00401-016-1652-z. Epub 2016 Dec 8.

Cancer Hallmarks, Biomarkers and Breast Cancer Molecular Subtypes.

J Cancer. 2016 Jun 23;7(10):1281-94. doi: 10.7150/jca.13141. eCollection 2016.

Study design in high-dimensional classification analysis.

Biostatistics. 2016 Oct;17(4):722-36. doi: 10.1093/biostatistics/kxw018. Epub 2016 May 5.

The Gene Expression Omnibus Database.

Methods Mol Biol. 2016;1418:93-110. doi: 10.1007/978-1-4939-3578-9_5.

McTwo: a two-step feature selection algorithm based on maximal information coefficient.

BMC Bioinformatics. 2016 Mar 23;17:142. doi: 10.1186/s12859-016-0990-0.

Gene expression profiling gut microbiota in different races of humans.

Sci Rep. 2016 Mar 15;6:23075. doi: 10.1038/srep23075.

iACP: a sequence-based tool for identifying anticancer peptides.

Oncotarget. 2016 Mar 29;7(13):16895-909. doi: 10.18632/oncotarget.7815.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

RIFS：一种随机重启的增量特征选择算法。

RIFS: a randomly restarted incremental feature selection algorithm.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献