校准随机森林以进行概率估计。

Calibrating random forests for probability estimation.

作者信息

Dankowski Theresa, Ziegler Andreas

机构信息

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

Zentrum für Klinische Studien, Universität zu Lübeck, Lübeck, Germany.

出版信息

Stat Med. 2016 Sep 30;35(22):3949-60. doi: 10.1002/sim.6959. Epub 2016 Apr 13.

DOI:10.1002/sim.6959

PMID:27074747

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5074325/

Abstract

Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

摘要

使用随机森林可以一致地估计概率。然而，尚不清楚应如何更新随机森林以对其他中心或在不同时间点进行预测。在这项工作中，我们提出了两种更新随机森林以进行概率估计的方法。第一种方法由埃尔坎提出，可用于更新任何能产生一致概率的机器学习方法，即所谓的概率机器。第二种方法是专门为随机森林开发的新策略。利用表示条件概率的终端节点，首先将随机森林转换为逻辑回归模型。这些模型进而用于重新校准。在一项模拟研究中对这两种更新策略进行了比较，并以德国中风研究合作组织的数据为例进行说明。在大多数模拟场景中，两种方法都带来了相似的改进。在埃尔坎方法的更严格假设未得到满足的模拟场景中，基于逻辑回归的随机森林重新校准方法优于埃尔坎方法。在中风数据上，它也比埃尔坎方法表现更好。埃尔坎方法的优势在于其对任何概率机器的普遍适用性。然而，如果该方法所依据的严格假设未得到满足，基于逻辑回归的方法在更新随机森林以进行概率估计时更可取。© 2016作者。《医学统计学》由约翰·威利父子有限公司出版

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68e2/5074325/2cbed0e24892/SIM-35-3949-g001.jpg

相似文献

Calibrating random forests for probability estimation.校准随机森林以进行概率估计。

Stat Med. 2016 Sep 30;35(22):3949-60. doi: 10.1002/sim.6959. Epub 2016 Apr 13.

Calibrating machine learning approaches for probability estimation: A comprehensive comparison.校准机器学习方法进行概率估计：全面比较。

Stat Med. 2023 Dec 20;42(29):5451-5478. doi: 10.1002/sim.9921. Epub 2023 Oct 17.

Probability machines: consistent probability estimation using nonparametric learning machines.概率机器：使用非参数学习机器进行一致概率估计。

Methods Inf Med. 2012;51(1):74-81. doi: 10.3414/ME00-01-0052. Epub 2011 Sep 14.

Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.使用机器学习方法对二分类和多分类结果进行概率估计：理论

Biom J. 2014 Jul;56(4):534-63. doi: 10.1002/bimj.201300068. Epub 2014 Jan 29.

Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications.使用机器学习方法进行二分类和多分类结果的概率估计：应用

Biom J. 2014 Jul;56(4):564-83. doi: 10.1002/bimj.201300077. Epub 2014 Feb 12.

A closed testing procedure to select an appropriate method for updating prediction models.一种用于选择更新预测模型合适方法的封闭测试程序。

Stat Med. 2017 Dec 10;36(28):4529-4539. doi: 10.1002/sim.7179. Epub 2016 Nov 28.

Random forests for the analysis of matched case-control studies.随机森林在匹配病例对照研究中的分析。

BMC Bioinformatics. 2024 Aug 1;25(1):253. doi: 10.1186/s12859-024-05877-5.

The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models.综合校准指数（ICI）及其相关指标，用于量化逻辑回归模型的校准。

Stat Med. 2019 Sep 20;38(21):4051-4065. doi: 10.1002/sim.8281. Epub 2019 Jul 3.

A spline-based tool to assess and visualize the calibration of multiclass risk predictions.一种基于样条的工具，用于评估和可视化多类别风险预测的校准情况。

J Biomed Inform. 2015 Apr;54:283-93. doi: 10.1016/j.jbi.2014.12.016. Epub 2015 Jan 9.

Unbiased split variable selection for random survival forests using maximally selected rank statistics.使用最大选择秩统计量对随机生存森林进行无偏分裂变量选择。

Stat Med. 2017 Apr 15;36(8):1272-1284. doi: 10.1002/sim.7212. Epub 2017 Jan 15.

引用本文的文献

Understanding overfitting in random forest for probability estimation: a visualization and simulation study.理解随机森林在概率估计中的过拟合：可视化与模拟研究。

Diagn Progn Res. 2024 Sep 27;8(1):14. doi: 10.1186/s41512-024-00177-1.

Propensity score matching: a tool for consumer risk modeling and portfolio underwriting.倾向得分匹配：一种用于消费者风险建模和投资组合承保的工具。

J Appl Stat. 2024 Jan 9;51(12):2481-2488. doi: 10.1080/02664763.2024.2302058. eCollection 2024.

Validation of a machine learning algorithm for identifying infants at risk of hypoxic ischaemic encephalopathy in a large unseen data set.在一个大型未知数据集中验证用于识别有缺氧缺血性脑病风险婴儿的机器学习算法。

Arch Dis Child Fetal Neonatal Ed. 2025 Apr 17;110(3):279-284. doi: 10.1136/archdischild-2024-327366.

Developing clinical prediction models: a step-by-step guide.临床预测模型的建立：分步指南。

BMJ. 2024 Sep 3;386:e078276. doi: 10.1136/bmj-2023-078276.

Externally validated machine learning algorithm accurately predicts medial tibial stress syndrome in military trainees: a multicohort study.外部验证的机器学习算法可准确预测军事训练人员的胫骨内侧应力综合征：一项多队列研究

BMJ Open Sport Exerc Med. 2023 Jun 13;9(2):e001566. doi: 10.1136/bmjsem-2023-001566. eCollection 2023.

Stability of clinical prediction models developed using statistical or machine learning methods.基于统计或机器学习方法开发的临床预测模型的稳定性。

Biom J. 2023 Dec;65(8):e2200302. doi: 10.1002/bimj.202200302. Epub 2023 Jul 19.

Predicting mortality in hemodialysis patients using machine learning analysis.使用机器学习分析预测血液透析患者的死亡率。

Clin Kidney J. 2020 Aug 11;14(5):1388-1395. doi: 10.1093/ckj/sfaa126. eCollection 2021 May.

Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes.迈向深度妊娠表型研究：改善妊娠结局的人工智能和机器学习方法的系统评价。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa369.

Identification of important factors in an inpatient fall risk prediction model to improve the quality of care using EHR and electronic administrative data: A machine-learning approach.利用电子病历和电子行政数据，通过机器学习方法识别住院患者跌倒风险预测模型中的重要因素，以提高护理质量。

Int J Med Inform. 2020 Nov;143:104272. doi: 10.1016/j.ijmedinf.2020.104272. Epub 2020 Sep 15.

Developing an Improved Statistical Approach for Survival Estimation in Bone Metastases Management: The Bone Metastases Ensemble Trees for Survival (BMETS) Model.开发一种改进的生存估计统计方法在骨转移管理：骨转移集成树生存模型 (BMETS)。

Int J Radiat Oncol Biol Phys. 2020 Nov 1;108(3):554-563. doi: 10.1016/j.ijrobp.2020.05.023. Epub 2020 May 22.

本文引用的文献

Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.使用机器学习方法对二分类和多分类结果进行概率估计：理论

Biom J. 2014 Jul;56(4):534-63. doi: 10.1002/bimj.201300068. Epub 2014 Jan 29.

Risk estimation and risk prediction using machine-learning methods.利用机器学习方法进行风险评估和预测。

Hum Genet. 2012 Oct;131(10):1639-54. doi: 10.1007/s00439-012-1194-y. Epub 2012 Jul 3.

Probability machines: consistent probability estimation using nonparametric learning machines.概率机器：使用非参数学习机器进行一致概率估计。

Methods Inf Med. 2012;51(1):74-81. doi: 10.3414/ME00-01-0052. Epub 2011 Sep 14.

On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.随机森林的随机丛林之旅：一种用于高维数据的随机森林的快速实现。

Bioinformatics. 2010 Jul 15;26(14):1752-8. doi: 10.1093/bioinformatics/btq257. Epub 2010 May 26.

Updating methods improved the performance of a clinical prediction model in new patients.更新方法提高了临床预测模型在新患者中的性能。

J Clin Epidemiol. 2008 Jan;61(1):76-86. doi: 10.1016/j.jclinepi.2007.04.018. Epub 2007 Nov 26.

Practical experiences on the necessity of external validation.外部验证必要性的实践经验。

Stat Med. 2007 Dec 30;26(30):5499-511. doi: 10.1002/sim.3069.

Validation and updating of predictive logistic regression models: a study on sample size and shrinkage.预测性逻辑回归模型的验证与更新：样本量与收缩的研究

Stat Med. 2004 Aug 30;23(16):2567-86. doi: 10.1002/sim.1844.

Age and National Institutes of Health Stroke Scale Score within 6 hours after onset are accurate predictors of outcome after cerebral ischemia: development and external validation of prognostic models.发病后6小时内的年龄和美国国立卫生研究院卒中量表评分是脑缺血后预后的准确预测指标：预后模型的建立与外部验证

Stroke. 2004 Jan;35(1):158-62. doi: 10.1161/01.STR.0000106761.94985.8B. Epub 2003 Dec 18.

FUNCTIONAL EVALUATION: THE BARTHEL INDEX.功能评估：巴氏指数

Md State Med J. 1965 Feb;14:61-5.

Predicting functional outcome and survival after acute ischemic stroke.预测急性缺血性中风后的功能转归和生存率。

J Neurol. 2002 Jul;249(7):888-95. doi: 10.1007/s00415-002-0755-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

校准随机森林以进行概率估计。

Calibrating random forests for probability estimation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献