Suppr超能文献

校准随机森林以进行概率估计。

Calibrating random forests for probability estimation.

作者信息

Dankowski Theresa, Ziegler Andreas

机构信息

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

Zentrum für Klinische Studien, Universität zu Lübeck, Lübeck, Germany.

出版信息

Stat Med. 2016 Sep 30;35(22):3949-60. doi: 10.1002/sim.6959. Epub 2016 Apr 13.

Abstract

Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

摘要

使用随机森林可以一致地估计概率。然而,尚不清楚应如何更新随机森林以对其他中心或在不同时间点进行预测。在这项工作中,我们提出了两种更新随机森林以进行概率估计的方法。第一种方法由埃尔坎提出,可用于更新任何能产生一致概率的机器学习方法,即所谓的概率机器。第二种方法是专门为随机森林开发的新策略。利用表示条件概率的终端节点,首先将随机森林转换为逻辑回归模型。这些模型进而用于重新校准。在一项模拟研究中对这两种更新策略进行了比较,并以德国中风研究合作组织的数据为例进行说明。在大多数模拟场景中,两种方法都带来了相似的改进。在埃尔坎方法的更严格假设未得到满足的模拟场景中,基于逻辑回归的随机森林重新校准方法优于埃尔坎方法。在中风数据上,它也比埃尔坎方法表现更好。埃尔坎方法的优势在于其对任何概率机器的普遍适用性。然而,如果该方法所依据的严格假设未得到满足,基于逻辑回归的方法在更新随机森林以进行概率估计时更可取。© 2016作者。《医学统计学》由约翰·威利父子有限公司出版

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68e2/5074325/2cbed0e24892/SIM-35-3949-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验