Hayn Dieter, Walch Harald, Stieg Jörg, Kreiner Karl, Ebner Hubert, Schreier Günter
AIT Austrian Institute of Technology, Graz, Austria.
TIP Unternehmensberatung GmbH, Graz, Austria.
Stud Health Technol Inform. 2017;236:328-335.
Machine learning algorithms are a promising approach to help physicians to deal with the ever increasing amount of data collected in healthcare each day. However, interpretation of suggestions derived from predictive models can be difficult.
The aim of this work was to quantify the influence of a specific feature on an individual decision proposed by a random forest (RF).
For each decision tree within the RF, the influence of each feature on a specific decision (FID) was quantified. For each feature, changes in outcome value due to the feature were summarized along the path. Results from all the trees in the RF were statistically merged. The ratio of FID to the respective feature's global importance was calculated (FIDrel).
Global feature importance, FID and FIDrel significantly differed, depending on the individual input data. Therefore, we suggest to present the most important features as determined for FID and for FIDrel, whenever results of a RF are visualized.
Feature influence on a specific decision can be quantified in RFs. Further studies will be necessary to evaluate our approach in a real world scenario.
机器学习算法是一种很有前景的方法,可帮助医生处理医疗保健领域每天收集的日益增多的数据。然而,对预测模型得出的建议进行解读可能会很困难。
这项工作的目的是量化特定特征对随机森林(RF)提出的个体决策的影响。
对于RF中的每棵决策树,量化每个特征对特定决策(FID)的影响。对于每个特征,沿着路径总结由于该特征导致的结果值变化。对RF中所有树的结果进行统计合并。计算FID与相应特征的全局重要性之比(FIDrel)。
全局特征重要性、FID和FIDrel显著不同,这取决于个体输入数据。因此,我们建议在可视化RF结果时,展示根据FID和FIDrel确定的最重要特征。
可以在RF中量化特征对特定决策的影响。有必要进行进一步研究,以在实际场景中评估我们的方法。