Cutler Adele, Stevens John R
Department of Mathematics and Statistics, Utah State University, Logan, UT, USA.
Methods Enzymol. 2006;411:422-32. doi: 10.1016/S0076-6879(06)11023-X.
Random Forests is a powerful multipurpose tool for predicting and understanding data. If gene expression data come from known groups or classes (e.g., tumor patients and controls), Random Forests can rank the genes in terms of their usefulness in separating the groups. When the groups are unknown, Random Forests uses an intrinsic measure of the similarity of the genes to extract useful multivariate structure, including clusters. This chapter summarizes the Random Forests methodology and illustrates its use on freely available data sets.
随机森林是一种用于预测和理解数据的强大的多用途工具。如果基因表达数据来自已知的组或类别(例如,肿瘤患者和对照组),随机森林可以根据基因在区分这些组方面的有用性对基因进行排名。当组未知时,随机森林使用基因相似性的内在度量来提取有用的多变量结构,包括聚类。本章总结了随机森林方法,并说明了其在免费可用数据集上的应用。