• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建随机森林的双图:教程。

Constructing bi-plots for random forest: Tutorial.

机构信息

Department of Pharmacology and Toxicology, School of Nutrition, Toxicology and Translational Research in Metabolism (NUTRIM), Maastricht University Medical Center+, Maastricht, the Netherlands.

Laboratoire de Spectrochimie Infrarouge et Raman - LASIR CNRS - UMR 8516, Université de Lille, Bâtiment C5, F-59000, Lille, France; Molecular Imaging and Photonics Unit, Department of Chemistry, Katholieke Universiteit Leuven, Celestijnenlaan 200F, B-3001, Leuven, Belgium.

出版信息

Anal Chim Acta. 2020 Sep 22;1131:146-155. doi: 10.1016/j.aca.2020.06.043. Epub 2020 Jul 11.

DOI:10.1016/j.aca.2020.06.043
PMID:32928475
Abstract

Current technological developments have allowed for a significant increase and availability of data. Consequently, this has opened enormous opportunities for the machine learning and data science field, translating into the development of new algorithms in a wide range of applications in medical, biomedical, daily-life, and national security areas. Ensemble techniques are among the pillars of the machine learning field, and they can be defined as approaches in which multiple, complex, independent/uncorrelated, predictive models are subsequently combined by either averaging or voting to yield a higher model performance. Random forest (RF), a popular ensemble method, has been successfully applied in various domains due to its ability to build predictive models with high certainty and little necessity of model optimization. RF provides both a predictive model and an estimation of the variable importance. However, the estimation of the variable importance is based on thousands of trees, and therefore, it does not specify which variable is important for which sample group. The present study demonstrates an approach based on the pseudo-sample principle that allows for construction of bi-plots (i.e. spin plots) associated with RF models. The pseudo-sample principle for RF. is explained and demonstrated by using two simulated datasets, and three different types of real data, which include political sciences, food chemistry and the human microbiome data. The pseudo-sample bi-plots, associated with RF and its unsupervised version, allow for a versatile visualization of multivariate models, and the variable importance and the relation among them.

摘要

当前的技术发展使得数据的数量和可用性大大增加。因此,这为机器学习和数据科学领域带来了巨大的机遇,促成了在医学、生物医学、日常生活和国家安全等广泛应用领域中新算法的发展。集成技术是机器学习领域的支柱之一,它们可以定义为通过平均或投票等方式将多个复杂、独立/不相关的预测模型组合在一起,从而提高模型性能的方法。随机森林(RF)是一种流行的集成方法,由于其能够构建具有高确定性和较少模型优化需求的预测模型,因此已成功应用于各个领域。RF 提供了预测模型和变量重要性的估计。然而,变量重要性的估计是基于数千棵树的,因此,它无法指定哪个变量对于哪个样本组是重要的。本研究展示了一种基于伪样本原理的方法,该方法允许构建与 RF 模型相关的双图(即旋转图)。通过使用两个模拟数据集和三个不同类型的真实数据(包括政治学、食品化学和人类微生物组数据),解释并演示了 RF 的伪样本原理。与 RF 及其无监督版本相关的伪样本双图允许对多变量模型、变量重要性及其之间的关系进行多功能可视化。

相似文献

1
Constructing bi-plots for random forest: Tutorial.构建随机森林的双图:教程。
Anal Chim Acta. 2020 Sep 22;1131:146-155. doi: 10.1016/j.aca.2020.06.043. Epub 2020 Jul 11.
2
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
3
Study becomes insight: Ecological learning from machine learning.研究转化为洞察:从机器学习中进行生态学习。
Methods Ecol Evol. 2021 Nov;12(11):2117-2128. doi: 10.1111/2041-210X.13686. Epub 2021 Aug 6.
4
Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?生命科学中的随机森林数据挖掘:是漫步公园还是迷失丛林?
Brief Bioinform. 2013 May;14(3):315-26. doi: 10.1093/bib/bbs034. Epub 2012 Jul 10.
5
Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain).利用随机森林和与内在和特定脆弱性相关的多源变量对地下水硝酸盐污染进行预测建模:以西班牙南部农业区为例。
Sci Total Environ. 2014 Apr 1;476-477:189-206. doi: 10.1016/j.scitotenv.2014.01.001. Epub 2014 Jan 24.
6
Advanced data fusion: Random forest proximities and pseudo-sample principle towards increased prediction accuracy and variable interpretation.高级数据融合:随机森林接近度和伪样本原理以提高预测准确性和变量解释能力
Anal Chim Acta. 2021 Oct 23;1183:339001. doi: 10.1016/j.aca.2021.339001. Epub 2021 Aug 28.
7
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.
8
Network inference with ensembles of bi-clustering trees.基于二部聚类树集成的网络推断。
BMC Bioinformatics. 2019 Oct 28;20(1):525. doi: 10.1186/s12859-019-3104-y.
9
The parameter sensitivity of random forests.随机森林的参数敏感性。
BMC Bioinformatics. 2016 Sep 1;17(1):331. doi: 10.1186/s12859-016-1228-x.
10
Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model.应用随机森林模型预测 ICU 中急性肾损伤患者的院内死亡率。
Int J Med Inform. 2019 May;125:55-61. doi: 10.1016/j.ijmedinf.2019.02.002. Epub 2019 Feb 12.

引用本文的文献

1
Enhancing privacy protection of physical examination data through synthetic algorithms based on differential privacy.通过基于差分隐私的合成算法增强体检数据的隐私保护。
BMC Med Inform Decis Mak. 2025 Sep 1;25(1):324. doi: 10.1186/s12911-025-03109-1.
2
Elucidating the role of LGALS3BP in coronary atherosclerosis: integrating bioinformatics and machine learning for advanced insights.阐明LGALS3BP在冠状动脉粥样硬化中的作用:整合生物信息学和机器学习以获得深入见解。
J Cardiothorac Surg. 2025 Aug 14;20(1):338. doi: 10.1186/s13019-025-03462-2.
3
Exploration of potential biomarkers and immune cell infiltration characteristics for peripheral atherosclerosis in sjögren's syndrome based on comprehensive bioinformatics analysis and machine learning.
基于综合生物信息学分析和机器学习探索干燥综合征外周动脉粥样硬化的潜在生物标志物和免疫细胞浸润特征
Front Genet. 2025 Jul 30;16:1546315. doi: 10.3389/fgene.2025.1546315. eCollection 2025.
4
Identification of Glycolysis-Related Genes in MAFLD and Their Immune Infiltration Implications: A Multi-Omics Analysis with Experimental Validation.非酒精性脂肪性肝炎相关糖酵解基因的鉴定及其免疫浸润意义:一项多组学分析及实验验证
Biomedicines. 2025 Jul 3;13(7):1636. doi: 10.3390/biomedicines13071636.
5
Integrating multi-dimensional data to reveal the mechanisms and molecular targets of baikening granules for treatment of pediatric influenza.整合多维数据以揭示百咳宁颗粒治疗小儿流感的机制及分子靶点。
Front Mol Biosci. 2025 Jul 11;12:1637980. doi: 10.3389/fmolb.2025.1637980. eCollection 2025.
6
Anoikis-related biomarkers PARP1 and SDCBP as diagnostic and therapeutic targets for asthma.与失巢凋亡相关的生物标志物PARP1和SDCBP作为哮喘的诊断和治疗靶点。
Sci Rep. 2025 Jul 9;15(1):24779. doi: 10.1038/s41598-025-09979-9.
7
LncRNAs regulates cell death in osteosarcoma.长链非编码RNA在骨肉瘤中调节细胞死亡。
Sci Rep. 2025 Jul 2;15(1):22592. doi: 10.1038/s41598-025-04440-3.
8
Identification of a PANoptosis-related gene signature reveals therapeutic potential of SFRP2 in pulmonary arterial hypertension.一种PAN细胞焦亡相关基因特征的鉴定揭示了SFRP2在肺动脉高压中的治疗潜力。
Front Cardiovasc Med. 2025 Apr 29;12:1521087. doi: 10.3389/fcvm.2025.1521087. eCollection 2025.
9
Comprehensive analysis and validation of autophagy-related gene in rheumatoid arthritis.类风湿关节炎中自噬相关基因的综合分析与验证
Front Cell Dev Biol. 2025 Mar 20;13:1563911. doi: 10.3389/fcell.2025.1563911. eCollection 2025.
10
Identification of Metabolism-Related Hub Genes in Heart Failure via Comprehensive Transcriptome Analysis.通过综合转录组分析鉴定心力衰竭中与代谢相关的枢纽基因
Genes (Basel). 2025 Mar 3;16(3):305. doi: 10.3390/genes16030305.