基于迭代特征消除随机森林的生存结局基因选择。

Gene selection using iterative feature elimination random forests for survival outcomes.

机构信息

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27705, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1422-31. doi: 10.1109/TCBB.2012.63.

DOI:10.1109/TCBB.2012.63

PMID:22547432

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3495190/

Abstract

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

摘要

尽管已经开发出许多用于分类的特征选择方法，但仍需要识别出具有删失生存结局的高维数据中的基因。传统的分类问题基因选择方法存在几个缺点。首先，大多数分类基因选择方法都是基于单基因的。其次，许多基因选择过程并没有嵌入到算法本身中。随机森林技术已被发现可在具有生存结局的高维数据环境中表现良好。它还有一个嵌入式功能来识别重要变量。因此，它是高维数据中具有生存结局的基因选择的理想候选者。在本文中，我们基于随机森林开发了一种新的方法来识别一组预后基因。我们使用几个真实数据集将我们的方法与几种机器学习方法和各种节点分裂标准进行了比较。我们的方法在模拟和真实数据分析中都表现良好。此外，我们还展示了我们的方法相对于基于单基因的方法的优势。我们的方法将生存结局的微阵列数据中的多变量相关性纳入其中。所描述的方法允许我们更好地利用具有生存结局的微阵列数据中的可用信息。

相似文献

Gene selection using iterative feature elimination random forests for survival outcomes.基于迭代特征消除随机森林的生存结局基因选择。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1422-31. doi: 10.1109/TCBB.2012.63.

Pathway analysis using random forests with bivariate node-split for survival outcomes.使用随机森林进行生存结局的双变量节点分裂的通路分析。

Bioinformatics. 2010 Jan 15;26(2):250-8. doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data.用于质谱和微阵列数据的递归支持向量机特征选择与样本分类

BMC Bioinformatics. 2006 Apr 10;7:197. doi: 10.1186/1471-2105-7-197.

Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类

BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.基于微阵列的癌症分类中随机森林与支持向量机的全面比较

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Robust feature selection for microarray data based on multicriterion fusion.基于多准则融合的微阵列数据稳健特征选择。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):1080-92. doi: 10.1109/TCBB.2010.103.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.MSVM-RFE：用于DNA微阵列数据多类基因选择的SVM-RFE扩展方法

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

Minimum number of genes for microarray feature selection.用于微阵列特征选择的最小基因数量。

Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:5692-5. doi: 10.1109/IEMBS.2008.4650506.

Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis.用于微阵列表达数据分析的两阶段支持向量机-递归特征消除基因选择策略的开发。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):365-81. doi: 10.1109/TCBB.2007.70224.

A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

引用本文的文献

Improved nonparametric survival prediction using CoxPH, Random Survival Forest & DeepHit Neural Network.基于 CoxPH、随机生存森林和 DeepHit 神经网络的改进非参数生存预测。

BMC Med Inform Decis Mak. 2024 May 7;24(1):120. doi: 10.1186/s12911-024-02525-z.

Rationally designed probiotics prevent shrimp white feces syndrome via the probiotics-gut microbiome-immunity axis.通过益生菌-肠道微生物群-免疫轴的作用，合理设计的益生菌可预防虾白便综合征。

NPJ Biofilms Microbiomes. 2024 Apr 11;10(1):40. doi: 10.1038/s41522-024-00509-5.

A network approach for low dimensional signatures from high throughput data.一种从高通量数据中提取低维特征的网络方法。

Sci Rep. 2022 Dec 23;12(1):22253. doi: 10.1038/s41598-022-25549-9.

Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia.随机生存森林模型识别高危儿童急性淋巴细胞白血病无事件生存的新型生物标志物。

Comput Struct Biotechnol J. 2022 Jan 6;20:583-597. doi: 10.1016/j.csbj.2022.01.003. eCollection 2022.

An Efficient Cancer Classification Model Using Microarray and High-Dimensional Data.基于微阵列和高维数据的高效癌症分类模型。

Comput Intell Neurosci. 2021 Dec 29;2021:7231126. doi: 10.1155/2021/7231126. eCollection 2021.

Personalized prediction of delayed graft function for recipients of deceased donor kidney transplants with machine learning.基于机器学习的尸体供肾移植受者延迟肾功能的个体化预测。

Sci Rep. 2020 Oct 27;10(1):18409. doi: 10.1038/s41598-020-75473-z.

Detecting biomarkers from microarray data using distributed correlation based gene selection.基于分布式相关的基因选择从微阵列数据中检测生物标志物。

Genes Genomics. 2020 Apr;42(4):449-465. doi: 10.1007/s13258-020-00916-w. Epub 2020 Feb 10.

Multiplatform biomarker identification using a data-driven approach enables single-sample classification.采用数据驱动的方法进行多平台生物标志物鉴定可实现单一样本分类。

BMC Bioinformatics. 2019 Nov 21;20(1):601. doi: 10.1186/s12859-019-3140-7.

A Selective Review on Random Survival Forests for High Dimensional Data.高维数据随机生存森林的选择性综述

Quant Biosci. 2017;36(2):85-96. doi: 10.22283/qbs.2017.36.2.85.

Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention.通过验证和去屏蔽血管内修复生存数据选择特征，以预测再次干预的风险。

BMC Med Inform Decis Mak. 2017 Aug 3;17(1):115. doi: 10.1186/s12911-017-0508-3.

本文引用的文献

PRDM1 is required for mantle cell lymphoma response to bortezomib.PRDM1 对于套细胞淋巴瘤对硼替佐米的反应是必需的。

Mol Cancer Res. 2010 Jun;8(6):907-18. doi: 10.1158/1541-7786.MCR-10-0131. Epub 2010 Jun 8.

Recursive Mahalanobis separability measure for gene subset selection.递归马氏可分性度量在基因子集选择中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):266-72. doi: 10.1109/TCBB.2010.43.

Improving the computational efficiency of recursive cluster elimination for gene selection.提高递归聚类消除基因选择的计算效率。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):122-9. doi: 10.1109/TCBB.2010.44.

Gene selection in microarray survival studies under possibly non-proportional hazards.在可能存在非比例风险的情况下，对微阵列生存研究中的基因选择。

Bioinformatics. 2010 Mar 15;26(6):784-90. doi: 10.1093/bioinformatics/btq035. Epub 2010 Jan 29.

Survival prediction from clinico-genomic models--a comparative study.基于临床基因组模型的生存预测——一项对比研究。

BMC Bioinformatics. 2009 Dec 13;10:413. doi: 10.1186/1471-2105-10-413.

Cyclin B1 is a prognostic proliferation marker with a high reproducibility in a population-based lymph node negative breast cancer cohort.Cyclin B1 是一种预后增殖标志物，在基于人群的淋巴结阴性乳腺癌队列中具有高重现性。

Int J Cancer. 2010 Aug 15;127(4):961-7. doi: 10.1002/ijc.25091.

Pathway analysis using random forests with bivariate node-split for survival outcomes.使用随机森林进行生存结局的双变量节点分裂的通路分析。

Bioinformatics. 2010 Jan 15;26(2):250-8. doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.

SVM-RFE with MRMR filter for gene selection.基于 MRMR 滤波器的 SVM-RFE 基因选择方法。

IEEE Trans Nanobioscience. 2010 Mar;9(1):31-7. doi: 10.1109/TNB.2009.2035284. Epub 2009 Oct 30.

Laplacian linear discriminant analysis approach to unsupervised feature selection.拉普拉斯线性判别分析方法在无监督特征选择中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):605-14. doi: 10.1109/TCBB.2007.70257.

Data-driven approach to predict survival of cancer patients: estimation of microarray genes' prediction significance by Cox proportional hazard regression model.基于数据驱动的癌症患者生存预测方法：通过Cox比例风险回归模型评估微阵列基因的预测显著性

IEEE Eng Med Biol Mag. 2009 Jul-Aug;28(4):58-66. doi: 10.1109/MEMB.2009.932937.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验