Brief Bioinform. 2011 Jul;12(4):369-73. doi: 10.1093/bib/bbr016. Epub 2011 Apr 15.
A recent study examined the stability of rankings from random forests using two variable importance measures (mean decrease accuracy (MDA) and mean decrease Gini (MDG)) and concluded that rankings based on the MDG were more robust than MDA. However, studies examining data-specific characteristics on ranking stability have been few. Rankings based on the MDG measure showed sensitivity to within-predictor correlation and differences in category frequencies, even when the number of categories was held constant, and thus may produce spurious results. The MDA measure was robust to these data characteristics. Further, under strong within-predictor correlation, MDG rankings were less stable than those using MDA.
最近的一项研究使用两种变量重要性度量(平均减少精度(MDA)和平均减少基尼(MDG))来检验随机森林的排名稳定性,并得出结论,基于 MDG 的排名比 MDA 更稳健。然而,关于排名稳定性的特定数据特征的研究很少。即使类别数量保持不变,基于 MDG 度量的排名也对预测器内相关性和类别频率差异敏感,因此可能产生虚假结果。MDA 度量对这些数据特征具有鲁棒性。此外,在强预测器内相关性下,MDG 排名比 MDA 排名稳定性差。