一种用于微阵列表达数据分析的基于随机森林的新型特征选择方法。

A novel random forests-based feature selection method for microarray expression data analysis.

作者信息

Yao Dengju, Yang Jing, Zhan Xiaojuan, Zhan Xiaorong, Xie Zhiqiang

出版信息

Int J Data Min Bioinform. 2015;13(1):84-101. doi: 10.1504/ijdmb.2015.070852.

DOI:10.1504/ijdmb.2015.070852

Abstract

High-dimensional data and a large number of redundancy features in bioinformatics research have created an urgent need for feature selection. In this paper, a novel random forests-based feature selection method is proposed that adopts the idea of stratifying feature space and combines generalised sequence backward searching and generalised sequence forward searching strategies. A random forest variable importance score is used to rank features, and different classifiers are used as a feature subset evaluating function. The proposed method is examined on five microarray expression datasets, including leukaemia, prostate, breast, nervous and DLBCL, and the average accuracies of the SVM classifier in these datasets are 100%, 95.24%, 85%, 91.67%, and 91.67%, respectively. The results show that the proposed method could not only improve the classification accuracy but also greatly reduce the computation time of the feature selection process.

摘要

生物信息学研究中的高维数据和大量冗余特征催生了对特征选择的迫切需求。本文提出了一种基于随机森林的新型特征选择方法，该方法采用特征空间分层的思想，结合广义序列后向搜索和广义序列前向搜索策略。使用随机森林变量重要性得分对特征进行排序，并使用不同的分类器作为特征子集评估函数。在包括白血病、前列腺癌、乳腺癌、神经和弥漫性大B细胞淋巴瘤在内的五个微阵列表达数据集上对所提出的方法进行了检验，这些数据集中支持向量机分类器的平均准确率分别为100%、95.24%、85%、91.67%和91.67%。结果表明，所提出的方法不仅可以提高分类准确率，还可以大大减少特征选择过程的计算时间。

相似文献

A novel random forests-based feature selection method for microarray expression data analysis.一种用于微阵列表达数据分析的基于随机森林的新型特征选择方法。

Int J Data Min Bioinform. 2015;13(1):84-101. doi: 10.1504/ijdmb.2015.070852.

The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

A centroid-based gene selection method for microarray data classification.一种基于质心的微阵列数据分类基因选择方法。

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

A fast gene selection method for multi-cancer classification using multiple support vector data description.一种基于多支持向量数据描述的多癌症分类快速基因选择方法。

J Biomed Inform. 2015 Feb;53:381-9. doi: 10.1016/j.jbi.2014.12.009. Epub 2014 Dec 27.

Gene and sample selection using T-score with sample selection.使用T分数进行基因和样本选择以及样本选择。

J Biomed Inform. 2016 Feb;59:31-41. doi: 10.1016/j.jbi.2015.11.003. Epub 2015 Nov 7.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.MAQC-II 乳腺癌和多发性骨髓瘤基因表达数据的特征选择和分类。

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。

Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.

Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.使用基于MapReduce的高效K近邻分类器分析微阵列白血病数据。

J Biomed Inform. 2016 Apr;60:395-409. doi: 10.1016/j.jbi.2016.03.002. Epub 2016 Mar 11.

Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine.使用松弛 Lasso 和广义多类支持向量机进行微阵列数据分析的特征选择和肿瘤分类。

J Theor Biol. 2019 Feb 21;463:77-91. doi: 10.1016/j.jtbi.2018.12.010. Epub 2018 Dec 8.

引用本文的文献

A new hybrid algorithm for three-stage gene selection based on whale optimization.基于鲸鱼优化算法的三阶段基因选择的新型混合算法。

Sci Rep. 2023 Mar 7;13(1):3783. doi: 10.1038/s41598-023-30862-y.

Identification of Novel microRNA Prognostic Markers Using Cascaded Wx, a Neural Network-Based Framework, in Lung Adenocarcinoma Patients.使用基于神经网络的框架Cascaded Wx在肺腺癌患者中鉴定新型微小RNA预后标志物

Cancers (Basel). 2020 Jul 14;12(7):1890. doi: 10.3390/cancers12071890.

A random forest based computational model for predicting novel lncRNA-disease associations.基于随机森林的计算模型预测新型 lncRNA-疾病关联。

BMC Bioinformatics. 2020 Mar 27;21(1):126. doi: 10.1186/s12859-020-3458-1.

A New Application of Multimodality Radiomics Improves Diagnostic Accuracy of Nonpalpable Breast Lesions in Patients with Microcalcifications-Only in Mammography.多模态放射组学的新应用提高了仅在乳腺 X 线摄影中存在微钙化的不可触及性乳腺病变的诊断准确性。

Med Sci Monit. 2019 Dec 20;25:9786-9793. doi: 10.12659/MSM.918721.

An improved random forest-based computational model for predicting novel miRNA-disease associations.基于随机森林的新型 miRNA-疾病关联预测计算模型的改进。

BMC Bioinformatics. 2019 Dec 3;20(1):624. doi: 10.1186/s12859-019-3290-7.

Cascaded Wx: A Novel Prognosis-Related Feature Selection Framework in Human Lung Adenocarcinoma Transcriptomes.级联Wx：人类肺腺癌转录组中一种新型的预后相关特征选择框架。

Front Genet. 2019 Jul 19;10:662. doi: 10.3389/fgene.2019.00662. eCollection 2019.

[Diffusion-weighted imaging texture features in differentiation of malignant from benign nonpalpable breast lesions for patients with microcalcifications-only in mammography].[仅在乳腺钼靶检查中有微钙化的患者中，扩散加权成像纹理特征在鉴别乳腺不可触及的良恶性病变中的应用]

Zhejiang Da Xue Xue Bao Yi Xue Ban. 2018 Feb 25;47(4):400-404. doi: 10.3785/j.issn.1008-9292.2018.08.12.

Using Supervised Learning Methods for Gene Selection in RNA-Seq Case-Control Studies.在RNA测序病例对照研究中使用监督学习方法进行基因选择

Front Genet. 2018 Aug 3;9:297. doi: 10.3389/fgene.2018.00297. eCollection 2018.

Intelligent Techniques Using Molecular Data Analysis in Leukaemia: An Opportunity for Personalized Medicine Support System.白血病中使用分子数据分析的智能技术：个性化医疗支持系统的机遇

Biomed Res Int. 2017;2017:3587309. doi: 10.1155/2017/3587309. Epub 2017 Jul 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于微阵列表达数据分析的基于随机森林的新型特征选择方法。

A novel random forests-based feature selection method for microarray expression data analysis.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献