Suppr超能文献

替代最小深度作为随机森林中变量的重要性度量。

Surrogate minimal depth as an importance measure for variables in random forests.

机构信息

Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, ermany.

出版信息

Bioinformatics. 2019 Oct 1;35(19):3663-3671. doi: 10.1093/bioinformatics/btz149.

Abstract

MOTIVATION

It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult.

RESULTS

Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting.

AVAILABILITY AND IMPLEMENTATION

https://github.com/StephanSeifert/SurrogateMinimalDepth.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

已经表明,机器学习方法随机森林可以成功地应用于组学数据,如基因表达数据,用于分类或回归,并选择对预测重要的变量。然而,预测变量之间的复杂关系,特别是因果预测变量之间的关系,使得目前应用的变量选择技术的解释变得困难。

结果

在这里,我们提出了一种新的变量选择方法,称为替代最小深度(SMD),它将替代变量纳入最小深度(MD)变量重要性的概念中。应用 SMD,我们表明可以重建模拟的相关模式,并且增加对变量关系的考虑可以改善变量选择。与现有的最先进的方法和 MD 相比,SMD 具有更高的识别因果变量的经验能力,而产生的变量列表同样稳定。总之,SMD 是一种很有前途的方法,可以更深入地了解高维数据环境中预测变量和结果之间的复杂相互作用。

可用性和实现

https://github.com/StephanSeifert/SurrogateMinimalDepth。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/6761946/8a5e7b875a54/btz149f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验