Boulesteix Anne-Laure, Janitza Silke, Hapfelmeier Alexander, Van Steen Kristel, Strobl Carolin
Brief Bioinform. 2015 Mar;16(2):338-45. doi: 10.1093/bib/bbu012. Epub 2014 Apr 9.
In an interesting and quite exhaustive review on Random Forests (RF) methodology in bioinformatics Touw et al. address--among other topics--the problem of the detection of interactions between variables based on RF methodology. We feel that some important statistical concepts, such as 'interaction', 'conditional dependence' or 'correlation', are sometimes employed inconsistently in the bioinformatics literature in general and in the literature on RF in particular. In this letter to the Editor, we aim to clarify some of the central statistical concepts and point out some confusing interpretations concerning RF given by Touw et al. and other authors.
在一篇关于生物信息学中随机森林(RF)方法的有趣且相当详尽的综述中,陶乌等人探讨了——除其他主题外——基于RF方法检测变量间相互作用的问题。我们认为,一些重要的统计概念,如“相互作用”“条件依赖性”或“相关性”,在一般的生物信息学文献中,尤其是在关于RF的文献中,有时使用并不一致。在这封给编辑的信中,我们旨在阐明一些核心统计概念,并指出陶乌等人及其他作者对RF给出的一些令人困惑的解释。