使用随机森林进行生存结局的双变量节点分裂的通路分析。

Pathway analysis using random forests with bivariate node-split for survival outcomes.

机构信息

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA.

出版信息

Bioinformatics. 2010 Jan 15;26(2):250-8. doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.

DOI:10.1093/bioinformatics/btp640

PMID:19933158

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2804301/

Abstract

MOTIVATION

There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are identified without incorporating biological knowledge are more difficult to interpret. Correlating pathway-based gene expression with survival outcomes may lead to biologically more meaningful prognosis biomarkers. Thus, a comprehensive study on how these methods perform in a pathway-based setting is warranted.

RESULTS

In this article, we describe a pathway-based method using random forests to correlate gene expression data with survival outcomes and introduce a novel bivariate node-splitting random survival forests. The proposed method allows researchers to identify important pathways for predicting patient prognosis and time to disease progression, and discover important genes within those pathways. We compared different implementations of random forests with different split criteria and found that bivariate node-splitting random survival forests with log-rank test is among the best. We also performed simulation studies that showed random forests outperforms several other machine learning algorithms and has comparable results with a newly developed component-wise Cox boosting model. Thus, pathway-based survival analysis using machine learning tools represents a promising approach in dissecting pathways and for generating new biological hypothesis from microarray studies.

AVAILABILITY

R package Pwayrfsurvival is available from URL: http://www.duke.edu/~hp44/pwayrfsurvival.htm.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在研究社区中，基于通路的方法在基因组学数据分析方面引起了极大的兴趣。尽管已经开发了机器学习方法（例如随机森林）来将生存结果与一组基因相关联，但尚无研究评估这些方法在整合通路信息以分析微阵列数据方面的能力。通常，不结合生物学知识而鉴定的基因更难以解释。将基于通路的基因表达与生存结果相关联可能会导致更具生物学意义的预后生物标志物。因此，有必要对这些方法在基于通路的环境中的性能进行全面研究。

结果

在本文中，我们描述了一种基于通路的方法，该方法使用随机森林将基因表达数据与生存结果相关联，并引入了一种新的双变量节点分裂随机生存森林。该方法允许研究人员识别出重要的通路，以预测患者的预后和疾病进展时间，并发现这些通路中的重要基因。我们比较了具有不同分裂标准的随机森林的不同实现，发现对数秩检验的双变量节点分裂随机生存森林是其中最好的之一。我们还进行了模拟研究，结果表明随机森林优于其他几种机器学习算法，并且与新开发的分量 Cox 增强模型具有可比的结果。因此，使用机器学习工具进行基于通路的生存分析代表了一种有前途的方法，可以用于剖析通路并从微阵列研究中生成新的生物学假设。

可用性

R 包 Pwayrfsurvival 可从以下网址获得：http://www.duke.edu/~hp44/pwayrfsurvival.htm。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

Pathway analysis using random forests with bivariate node-split for survival outcomes.使用随机森林进行生存结局的双变量节点分裂的通路分析。

Bioinformatics. 2010 Jan 15;26(2):250-8. doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.

Pathway analysis using random forests classification and regression.使用随机森林分类和回归的通路分析

Bioinformatics. 2006 Aug 15;22(16):2028-36. doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29.

Gene selection using iterative feature elimination random forests for survival outcomes.基于迭代特征消除随机森林的生存结局基因选择。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1422-31. doi: 10.1109/TCBB.2012.63.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.基于微阵列的癌症分类中随机森林与支持向量机的全面比较

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Pathway-based identification of SNPs predictive of survival.基于通路的 SNP 预测生存分析。

Eur J Hum Genet. 2011 Jun;19(6):704-9. doi: 10.1038/ejhg.2011.3. Epub 2011 Feb 2.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合：一种蒙特卡洛交叉熵方法。

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

A primer on gene expression and microarrays for machine learning researchers.面向机器学习研究人员的基因表达与微阵列入门知识。

J Biomed Inform. 2004 Aug;37(4):293-303. doi: 10.1016/j.jbi.2004.07.002.

Bayesian variable selection for the analysis of microarray data with censored outcomes.用于分析具有删失结局的微阵列数据的贝叶斯变量选择

Bioinformatics. 2006 Sep 15;22(18):2262-8. doi: 10.1093/bioinformatics/btl362. Epub 2006 Jul 15.

Are random forests better than support vector machines for microarray-based cancer classification?对于基于微阵列的癌症分类，随机森林算法比支持向量机算法更好吗？

AMIA Annu Symp Proc. 2007 Oct 11;2007:686-90.

Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。

BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.

引用本文的文献

Susceptibility Analysis of Geohazards in the Longmen Mountain Region after the Wenchuan Earthquake.汶川地震后龙门山地区地质灾害敏感性分析

Int J Environ Res Public Health. 2022 Mar 9;19(6):3229. doi: 10.3390/ijerph19063229.

Radiomics analysis using stability selection supervised component analysis for right-censored survival data.使用稳定性选择监督成分分析对右删失生存数据进行放射组学分析。

Comput Biol Med. 2020 Sep;124:103959. doi: 10.1016/j.compbiomed.2020.103959. Epub 2020 Aug 6.

Comparison of Random Forest Model and Frequency Ratio Model for Landslide Susceptibility Mapping (LSM) in Yunyang County (Chongqing, China).随机森林模型与频率比模型在渝阳区（中国重庆）滑坡易发性制图（LSM）中的比较。

Int J Environ Res Public Health. 2020 Jun 12;17(12):4206. doi: 10.3390/ijerph17124206.

Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery.将基因交互信息整合到重新加权的随机生存森林方法中，以实现准确的生存预测和生存生物标志物发现。

Sci Rep. 2018 Sep 4;8(1):13202. doi: 10.1038/s41598-018-31497-0.

Analysis of a large data set to identify predictors of blood transfusion in primary total hip and knee arthroplasty.分析大型数据集以确定初次全髋关节和膝关节置换术中输血的预测因素。

Transfusion. 2018 Aug;58(8):1855-1862. doi: 10.1111/trf.14783. Epub 2018 Aug 25.

Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.大数据工具集在药代动力学中的应用：机器学习在事件时间分析中的应用。

Clin Transl Sci. 2018 May;11(3):305-311. doi: 10.1111/cts.12541. Epub 2018 Mar 13.

Survival Forests with R-Squared Splitting Rules.具有R平方分割规则的生存森林

J Comput Biol. 2018 Apr;25(4):388-395. doi: 10.1089/cmb.2017.0107. Epub 2017 Dec 21.

Big data and computational biology strategy for personalized prognosis.个性化预后的大数据与计算生物学策略

Oncotarget. 2016 Jun 28;7(26):40200-40220. doi: 10.18632/oncotarget.9571.

Statistical aspect of translational and correlative studies in clinical trials.临床试验中转化研究与相关性研究的统计学方面

Chin Clin Oncol. 2016 Feb;5(1):11. doi: 10.3978/j.issn.2304-3865.2014.07.04.

Random Effects Model for Multiple Pathway Analysis with Applications to Type II Diabetes Microarray Data.用于多通路分析的随机效应模型及其在II型糖尿病微阵列数据中的应用

Stat Biosci. 2015 Oct 1;7(2):167-186. doi: 10.1007/s12561-014-9109-1. Epub 2014 Jan 30.

本文引用的文献

The Hedgehog pathway is a possible therapeutic target for patients with estrogen receptor-negative breast cancer.刺猬信号通路是雌激素受体阴性乳腺癌患者一个可能的治疗靶点。

Anticancer Res. 2009 Mar;29(3):871-9.

Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection.用于同时检验基因集/通路的显著性和基因选择的稀疏线性判别分析。

Bioinformatics. 2009 May 1;25(9):1145-51. doi: 10.1093/bioinformatics/btp019. Epub 2009 Jan 25.

Apigenin causes G(2)/M arrest associated with the modulation of p21(Cip1) and Cdc2 and activates p53-dependent apoptosis pathway in human breast cancer SK-BR-3 cells.芹菜素可导致人乳腺癌SK-BR-3细胞出现与p21（Cip1）和Cdc2调节相关的G2/M期阻滞，并激活p53依赖性凋亡途径。

J Nutr Biochem. 2009 Apr;20(4):285-90. doi: 10.1016/j.jnutbio.2008.03.005. Epub 2008 Jul 24.

Sparse kernel methods for high-dimensional survival data.用于高维生存数据的稀疏核方法。

Bioinformatics. 2008 Jul 15;24(14):1632-8. doi: 10.1093/bioinformatics/btn253. Epub 2008 May 30.

Decorrelation of the true and estimated classifier errors in high-dimensional settings.高维环境下真实分类器误差与估计分类器误差的去相关。

EURASIP J Bioinform Syst Biol. 2007;2007(1):38473. doi: 10.1155/2007/38473.

Building pathway clusters from Random Forests classification using class votes.利用类别投票从随机森林分类中构建通路簇。

BMC Bioinformatics. 2008 Feb 6;9:87. doi: 10.1186/1471-2105-9-87.

Apoptosis of estrogen-receptor negative breast cancer and colon cancer cell lines by PTP alpha and src RNAi.蛋白酪氨酸磷酸酶α（PTPα）和src基因RNA干扰诱导雌激素受体阴性乳腺癌及结肠癌细胞系凋亡

Int J Cancer. 2008 May 1;122(9):1999-2007. doi: 10.1002/ijc.23321.

Successful anti-cancer drug targets able to pass FDA review demonstrate the identifiable signature distinct from the signatures of random genes and initially proposed targets.能够通过美国食品药品监督管理局（FDA）审查的成功抗癌药物靶点显示出与随机基因和最初提出的靶点的特征不同的可识别特征。

Bioinformatics. 2008 Feb 1;24(3):389-95. doi: 10.1093/bioinformatics/btm447. Epub 2007 Oct 8.

RAR and RXR modulation in cancer and metabolic disease.癌症与代谢性疾病中的视黄酸受体（RAR）和视黄酸X受体（RXR）调节

Nat Rev Drug Discov. 2007 Oct;6(10):793-810. doi: 10.1038/nrd2397.

Transforming growth factor-beta can suppress tumorigenesis through effects on the putative cancer stem or early progenitor cell and committed progeny in a breast cancer xenograft model.在乳腺癌异种移植模型中，转化生长因子-β可通过对假定的癌症干细胞或早期祖细胞及其分化后代的作用来抑制肿瘤发生。

Cancer Res. 2007 Sep 15;67(18):8643-52. doi: 10.1158/0008-5472.CAN-07-0982.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验