使用概率机进行风险估计。

Risk estimation using probability machines.

机构信息

Clinical Trials and Outcomes Branch, National Institute of Arthritis, Musculoskeletal and Skin Diseases, National Institutes of Health, Room 4-1350, Bldg 10 CRC, 10 Center Drive, Bethesda, MD 20892-1468, USA.

出版信息

BioData Min. 2014 Mar 1;7(1):2. doi: 10.1186/1756-0381-7-2.

DOI:10.1186/1756-0381-7-2

PMID:24581306

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4015350/

Abstract

BACKGROUND

Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios.

RESULTS

We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented.

CONCLUSIONS

The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.

摘要

背景

逻辑回归一直是描述和分析二项结果与观测特征之间关系的事实上的（通常也是唯一的）模型。它被广泛用于获得给定预测器的结果的条件概率，以及使用条件优势比的预测器效果大小估计。

结果

我们展示了如何使用用于二项结果的统计学习机器，这些机器在非参数回归问题上是可证明一致的，可以同时提供一致的条件概率估计和条件效果大小估计。学习机器的效果大小估计利用了我们对反事实论点的理解，这些论点是解释此类估计的核心。我们表明，如果数据生成模型是逻辑的，我们可以使用几乎与正确逻辑模型相同的效率来恢复准确的概率预测和效果大小估计，无论是主效应还是交互作用。我们还提出了一种使用学习机器快速有效地扫描可能的交互作用的方法。展示了使用随机森林概率机器进行的模拟。

结论

我们提出的模型对数据结构没有任何假设，而是通过仅指定涉及的预测器而不是任何特定的模型结构来捕捉数据中的模式。因此，它们不会像逻辑模型那样面临模型误设和由此产生的估计偏差的相同风险。这种我们称之为“风险机”的方法将具有其源自的统计机器的一些特性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用概率机进行风险估计。

Risk estimation using probability machines.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

使用概率机进行风险估计。

Risk estimation using probability machines.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献