随机森林

For the task of analyzing survival data to derive risk factors associated with mortality, physicians, researchers, and biostatisticians have typically relied on certain types of regression techniques, most notably the Cox model. With the advent of more widely distributed computing power, methods which require more complex mathematics have become increasingly common. Particularly in this era of "big data" and machine learning, survival analysis has become methodologically broader. This paper aims to explore one technique known as Random Forest. The Random Forest technique is a regression tree technique which uses bootstrap aggregation and randomization of predictors to achieve a high degree of predictive accuracy. The various input parameters of the random forest are explored. Colon cancer data (n = 66,807) from the SEER database is then used to construct both a Cox model and a random forest model to determine how well the models perform on the same data. Both models perform well, achieving a concordance error rate of approximately 18%.

对于分析生存数据以得出与死亡率相关的风险因素这一任务，医生、研究人员和生物统计学家通常依赖于某些类型的回归技术，最著名的是Cox模型。随着计算能力更广泛的普及，需要更复杂数学的方法变得越来越普遍。特别是在这个“大数据”和机器学习的时代，生存分析在方法上变得更加广泛。本文旨在探索一种称为随机森林的技术。随机森林技术是一种回归树技术，它使用自助聚合和预测变量的随机化来实现高度的预测准确性。探讨了随机森林的各种输入参数。然后使用来自SEER数据库的结肠癌数据（n = 66,807）构建Cox模型和随机森林模型，以确定这些模型在相同数据上的表现如何。两个模型都表现良好，一致性错误率约为18%。

Random Forest.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献