optRF：通过确定最佳树的数量来优化随机森林稳定性。

optRF: Optimising random forest stability by determining the optimal number of trees.

作者信息

Lange Thomas M, Gültas Mehmet, Schmitt Armin O, Heinrich Felix

机构信息

Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany.

Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494, Soest, Germany.

出版信息

BMC Bioinformatics. 2025 Mar 31;26(1):95. doi: 10.1186/s12859-025-06097-1.

DOI:10.1186/s12859-025-06097-1

PMID:40165065

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11959736/

Abstract

Machine learning is frequently used to make decisions based on big data. Among these techniques, random forest is particularly prominent. Although random forest is known to have many advantages, one aspect that is often overseen is that it is a non-deterministic method that can produce different models using the same input data. This can have severe consequences on decision-making processes. In this study, we introduce a method to quantify the impact of non-determinism on predictions, variable importance estimates, and decisions based on the predictions or variable importance estimates. Our findings demonstrate that increasing the number of trees in random forests enhances the stability in a non-linear way while computation time increases linearly. Consequently, we conclude that there exists an optimal number of trees for any given data set that maximises the stability without unnecessarily increasing the computation time. Based on these findings, we have developed the R package optRF which models the relationship between the number of trees and the stability of random forest, providing recommendations for the optimal number of trees for any given data set.

摘要

机器学习经常被用于基于大数据做出决策。在这些技术中，随机森林尤为突出。尽管随机森林有许多优点，但一个经常被忽视的方面是，它是一种非确定性方法，使用相同的输入数据可能会产生不同的模型。这可能会对决策过程产生严重影响。在本研究中，我们介绍了一种方法，用于量化非确定性对预测、变量重要性估计以及基于预测或变量重要性估计的决策的影响。我们的研究结果表明，增加随机森林中的树的数量会以非线性方式提高稳定性，而计算时间呈线性增加。因此，我们得出结论，对于任何给定的数据集，都存在一个最优的树的数量，它能在不不必要地增加计算时间的情况下最大化稳定性。基于这些发现，我们开发了R包optRF，该包对树的数量与随机森林稳定性之间的关系进行建模，为任何给定数据集的最优树的数量提供建议。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

optRF：通过确定最佳树的数量来优化随机森林稳定性。

optRF: Optimising random forest stability by determining the optimal number of trees.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

optRF：通过确定最佳树的数量来优化随机森林稳定性。

optRF: Optimising random forest stability by determining the optimal number of trees.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献