• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

optRF:通过确定最佳树的数量来优化随机森林稳定性。

optRF: Optimising random forest stability by determining the optimal number of trees.

作者信息

Lange Thomas M, Gültas Mehmet, Schmitt Armin O, Heinrich Felix

机构信息

Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany.

Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494, Soest, Germany.

出版信息

BMC Bioinformatics. 2025 Mar 31;26(1):95. doi: 10.1186/s12859-025-06097-1.

DOI:10.1186/s12859-025-06097-1
PMID:40165065
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11959736/
Abstract

Machine learning is frequently used to make decisions based on big data. Among these techniques, random forest is particularly prominent. Although random forest is known to have many advantages, one aspect that is often overseen is that it is a non-deterministic method that can produce different models using the same input data. This can have severe consequences on decision-making processes. In this study, we introduce a method to quantify the impact of non-determinism on predictions, variable importance estimates, and decisions based on the predictions or variable importance estimates. Our findings demonstrate that increasing the number of trees in random forests enhances the stability in a non-linear way while computation time increases linearly. Consequently, we conclude that there exists an optimal number of trees for any given data set that maximises the stability without unnecessarily increasing the computation time. Based on these findings, we have developed the R package optRF which models the relationship between the number of trees and the stability of random forest, providing recommendations for the optimal number of trees for any given data set.

摘要

机器学习经常被用于基于大数据做出决策。在这些技术中,随机森林尤为突出。尽管随机森林有许多优点,但一个经常被忽视的方面是,它是一种非确定性方法,使用相同的输入数据可能会产生不同的模型。这可能会对决策过程产生严重影响。在本研究中,我们介绍了一种方法,用于量化非确定性对预测、变量重要性估计以及基于预测或变量重要性估计的决策的影响。我们的研究结果表明,增加随机森林中的树的数量会以非线性方式提高稳定性,而计算时间呈线性增加。因此,我们得出结论,对于任何给定的数据集,都存在一个最优的树的数量,它能在不不必要地增加计算时间的情况下最大化稳定性。基于这些发现,我们开发了R包optRF,该包对树的数量与随机森林稳定性之间的关系进行建模,为任何给定数据集的最优树的数量提供建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/9ccfd6b5ea83/12859_2025_6097_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/5a0f957beb4d/12859_2025_6097_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/2e9926e0ee41/12859_2025_6097_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/3089872ec8b2/12859_2025_6097_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/8f49ad8a31cf/12859_2025_6097_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/9ccfd6b5ea83/12859_2025_6097_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/5a0f957beb4d/12859_2025_6097_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/2e9926e0ee41/12859_2025_6097_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/3089872ec8b2/12859_2025_6097_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/8f49ad8a31cf/12859_2025_6097_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8be2/11959736/9ccfd6b5ea83/12859_2025_6097_Fig5_HTML.jpg

相似文献

1
optRF: Optimising random forest stability by determining the optimal number of trees.optRF:通过确定最佳树的数量来优化随机森林稳定性。
BMC Bioinformatics. 2025 Mar 31;26(1):95. doi: 10.1186/s12859-025-06097-1.
2
Oblique and rotation double random forest.倾斜和旋转双重随机森林。
Neural Netw. 2022 Sep;153:496-517. doi: 10.1016/j.neunet.2022.06.012. Epub 2022 Jun 18.
3
A comparison of random forest variable selection methods for regression modeling of continuous outcomes.用于连续结果回归建模的随机森林变量选择方法比较
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf096.
4
Bias in random forest variable importance measures: illustrations, sources and a solution.随机森林变量重要性度量中的偏差:示例、来源及解决方案
BMC Bioinformatics. 2007 Jan 25;8:25. doi: 10.1186/1471-2105-8-25.
5
Unsupervised Gene Network Inference with Decision Trees and Random Forests.使用决策树和随机森林进行无监督基因网络推断
Methods Mol Biol. 2019;1883:195-215. doi: 10.1007/978-1-4939-8882-2_8.
6
Random forest methodology for model-based recursive partitioning: the mobForest package for R.基于模型的递归分割的随机森林方法:R 中的 mobForest 包。
BMC Bioinformatics. 2013 Apr 11;14:125. doi: 10.1186/1471-2105-14-125.
7
Robustness of Random Forest-based gene selection methods.基于随机森林的基因选择方法的稳健性。
BMC Bioinformatics. 2014 Jan 13;15:8. doi: 10.1186/1471-2105-15-8.
8
Predicting vitamin D deficiency using optimized random forest classifier.使用优化的随机森林分类器预测维生素 D 缺乏。
Clin Nutr ESPEN. 2024 Apr;60:1-10. doi: 10.1016/j.clnesp.2023.12.146. Epub 2023 Dec 28.
9
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
10
An experimental study of the intrinsic stability of random forest variable importance measures.随机森林变量重要性度量内在稳定性的实验研究
BMC Bioinformatics. 2016 Feb 3;17:60. doi: 10.1186/s12859-016-0900-5.

引用本文的文献

1
Quantitative Ultrasound-Based Precision Diagnosis of Papillary, Follicular, and Medullary Thyroid Carcinomas Using Morphological, Structural, and Textural Features.基于定量超声的甲状腺乳头状癌、滤泡状癌和髓样癌的精确诊断:利用形态学、结构和纹理特征
Cancers (Basel). 2025 Aug 24;17(17):2761. doi: 10.3390/cancers17172761.

本文引用的文献

1
Exploring the potential of incremental feature selection to improve genomic prediction accuracy.探索增量特征选择提高基因组预测准确性的潜力。
Genet Sel Evol. 2023 Nov 9;55(1):78. doi: 10.1186/s12711-023-00853-8.
2
Developing and testing inter-rater reliability of a data collection tool for patient health records on end-of-life care of neurological patients in an acute hospital ward.开发和测试用于急性医院病房中神经科患者临终关怀的患者健康记录数据收集工具的评分者间信度。
Nurs Open. 2023 Aug;10(8):5500-5508. doi: 10.1002/nop2.1789. Epub 2023 May 4.
3
Multimodal deep learning methods enhance genomic prediction of wheat breeding.
多模态深度学习方法提高了小麦育种的基因组预测。
G3 (Bethesda). 2023 May 2;13(5). doi: 10.1093/g3journal/jkad045.
4
Harnessing underutilized gene bank diversity and genomic prediction of cross usefulness to enhance resistance to Phytophthora cactorum in strawberry.利用未充分利用的基因库多样性以及交叉实用性的基因组预测来增强草莓对恶疫霉的抗性。
Plant Genome. 2023 Mar;16(1):e20275. doi: 10.1002/tpg2.20275. Epub 2022 Dec 8.
5
Non-linear transformation of enzyme-linked immunosorbent assay (ELISA) measurements allows usage of linear models for data analysis.酶联免疫吸附测定(ELISA)测量的非线性变换允许使用线性模型进行数据分析。
Virol J. 2022 May 18;19(1):85. doi: 10.1186/s12985-022-01804-3.
6
Deciphering Pleiotropic Signatures of Regulatory SNPs in L. Using Multi-Omics Data and Machine Learning Algorithms.利用多组学数据和机器学习算法破译 L 中调控 SNP 的多效性特征。
Int J Mol Sci. 2022 May 4;23(9):5121. doi: 10.3390/ijms23095121.
7
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes.MIDESP:基于互信息的定性和定量表型上位性SNP对检测
Biology (Basel). 2021 Sep 16;10(9):921. doi: 10.3390/biology10090921.
8
Annotation of snoRNA abundance across human tissues reveals complex snoRNA-host gene relationships.对人类组织中 snoRNA 丰度的注释揭示了复杂的 snoRNA-宿主基因关系。
Genome Biol. 2021 Jun 4;22(1):172. doi: 10.1186/s13059-021-02391-2.
9
Random forest-based prediction of stroke outcome.基于随机森林的脑卒中预后预测。
Sci Rep. 2021 May 12;11(1):10071. doi: 10.1038/s41598-021-89434-7.
10
Limited haplotype diversity underlies polygenic trait architecture across 70 years of wheat breeding.有限的单倍型多样性是 70 年来小麦育种中多基因性状结构的基础。
Genome Biol. 2021 May 6;22(1):137. doi: 10.1186/s13059-021-02354-7.