• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机森林缺失数据算法

Random Forest Missing Data Algorithms.

作者信息

Tang Fei, Ishwaran Hemant

机构信息

Division of Biostatistics, University of Miami.

出版信息

Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.

DOI:10.1002/sam.11348
PMID:29403567
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5796790/
Abstract

Random forest (RF) missing data algorithms are an attractive approach for imputing missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity, and they have the potential to scale to big data settings. Currently there are many different RF imputation algorithms, but relatively little guidance about their efficacy. Using a large, diverse collection of data sets, imputation performance of various RF algorithms was assessed under different missing data mechanisms. Algorithms included proximity imputation, on the fly imputation, and imputation utilizing multivariate unsupervised and supervised splitting-the latter class representing a generalization of a new promising imputation algorithm called missForest. Our findings reveal RF imputation to be generally robust with performance improving with increasing correlation. Performance was good under moderate to high missingness, and even (in certain cases) when data was missing not at random.

摘要

随机森林(RF)缺失数据算法是一种用于插补缺失数据的有吸引力的方法。它们具有能够处理混合类型缺失数据的理想特性,能适应交互作用和非线性,并且有扩展到大数据设置的潜力。目前有许多不同的随机森林插补算法,但关于它们功效的指导相对较少。使用大量多样的数据集,在不同的缺失数据机制下评估了各种随机森林算法的插补性能。算法包括临近值插补、即时插补,以及利用多变量无监督和有监督分裂的插补——后一类代表了一种名为missForest的新的有前景的插补算法的推广。我们的研究结果表明,随机森林插补通常具有稳健性,性能会随着相关性的增加而提高。在中度到高度缺失的情况下,甚至(在某些情况下)当数据非随机缺失时,性能也良好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/071d34545616/nihms884039f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/03ae21d0a2e6/nihms884039f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/5f97465d11d4/nihms884039f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/92291c22efed/nihms884039f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/4be19477a2b8/nihms884039f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/a836b764e990/nihms884039f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/071d34545616/nihms884039f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/03ae21d0a2e6/nihms884039f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/5f97465d11d4/nihms884039f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/92291c22efed/nihms884039f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/4be19477a2b8/nihms884039f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/a836b764e990/nihms884039f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa4/5796790/071d34545616/nihms884039f6.jpg

相似文献

1
Random Forest Missing Data Algorithms.随机森林缺失数据算法
Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.
2
missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.使用二进制粒子群优化进行特征选择的 missForest 提高了连续数据的插补准确性。
Genes Genomics. 2022 Jun;44(6):651-658. doi: 10.1007/s13258-022-01247-8. Epub 2022 Apr 6.
3
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
4
Generative adversarial networks for imputing missing data for big data clinical research.生成对抗网络在大数据临床研究中用于填补缺失数据。
BMC Med Res Methodol. 2021 Apr 20;21(1):78. doi: 10.1186/s12874-021-01272-3.
5
The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model.基于最优机器学习的 Cox 比例风险模型缺失数据插补。
Front Public Health. 2021 Jul 5;9:680054. doi: 10.3389/fpubh.2021.680054. eCollection 2021.
6
The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择:一项模拟研究。
J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.
7
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.基于机制的插补:代谢组学中处理缺失值的两步法。
BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.
8
Imputing missing covariates in time-to-event analysis within distributed research networks: A simulation study.在分布式研究网络中的事件时间分析中推断缺失协变量:一项模拟研究。
Pharmacoepidemiol Drug Saf. 2023 Mar;32(3):330-340. doi: 10.1002/pds.5563. Epub 2022 Nov 30.
9
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
10
Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction.基于随机森林的缺失数据插补在非正态性、非线性和交互作用存在下的准确性。
BMC Med Res Methodol. 2020 Jul 25;20(1):199. doi: 10.1186/s12874-020-01080-1.

引用本文的文献

1
City-level process-related CO emissions in China 2000-2021.2000 - 2021年中国城市层面与过程相关的一氧化碳排放
Sci Data. 2025 Aug 15;12(1):1435. doi: 10.1038/s41597-025-05782-3.
2
Association between international normalized ratio-to-albumin ratio and mortality in critically ill patients with gastrointestinal bleeding: a retrospective MIMIC-IV database study.国际标准化比值与白蛋白比值和胃肠道出血重症患者死亡率之间的关联:一项基于MIMIC-IV数据库的回顾性研究
BMC Gastroenterol. 2025 Aug 11;25(1):574. doi: 10.1186/s12876-025-04179-1.
3
The application of random forest-based models in prognostication of gastrointestinal tract malignancies: a systematic review.

本文引用的文献

1
The Effect of Splitting on Random Forests.分裂对随机森林的影响。
Mach Learn. 2015 Apr;99(1):75-118. doi: 10.1007/s10994-014-5451-2. Epub 2014 Jul 2.
2
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
3
Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.基于 MICE 使用随机森林和参数插补模型比较缺失数据插补:CALIBER 研究。
基于随机森林的模型在胃肠道恶性肿瘤预后评估中的应用:一项系统综述
Front Artif Intell. 2025 Jul 18;8:1517670. doi: 10.3389/frai.2025.1517670. eCollection 2025.
4
The inflammation-depression link: How social networks buffer or exacerbate risk.炎症与抑郁的关联:社交网络如何缓冲或加剧风险。
Brain Behav Immun Health. 2025 Jul 4;48:101052. doi: 10.1016/j.bbih.2025.101052. eCollection 2025 Oct.
5
Prediction of three-year all-cause mortality in patients with heart failure and atrial fibrillation using the CatBoost model.使用CatBoost模型预测心力衰竭合并心房颤动患者的三年全因死亡率。
BMC Cardiovasc Disord. 2025 Jul 4;25(1):466. doi: 10.1186/s12872-025-04928-w.
6
Next-Generation Sequencing-Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non-Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods.美国晚期或转移性非鳞状非小细胞肺癌患者基于新一代测序的检测:使用机器学习方法的预测建模
JMIR Cancer. 2025 Jun 11;11:e64399. doi: 10.2196/64399.
7
Impact of the Triglyceride-Glucose index on all-cause and cardiovascular mortalities across different metabolic health and obesity statuses in US adults.甘油三酯-葡萄糖指数对美国成年人不同代谢健康和肥胖状态下全因死亡率和心血管死亡率的影响。
BMC Public Health. 2025 May 14;25(1):1767. doi: 10.1186/s12889-025-22901-2.
8
Impact of Nonadherence to Any Antiplatelet Therapy After PCI With Drug-Eluting Stents on Critical Outcomes.药物洗脱支架PCI术后未坚持任何抗血小板治疗对关键结局的影响。
JACC Asia. 2025 Jun;5(6):758-768. doi: 10.1016/j.jacasi.2025.03.008. Epub 2025 May 13.
9
The internationalization of renewable energy finance.可再生能源融资的国际化。
iScience. 2025 Apr 6;28(5):112367. doi: 10.1016/j.isci.2025.112367. eCollection 2025 May 16.
10
Identifying determinants of malnutrition in under-five children in Bangladesh: insights from the BDHS-2022 cross-sectional study.确定孟加拉国五岁以下儿童营养不良的决定因素:来自2022年孟加拉国人口与健康调查横断面研究的见解
Sci Rep. 2025 Apr 24;15(1):14336. doi: 10.1038/s41598-025-99288-y.
Am J Epidemiol. 2014 Mar 15;179(6):764-74. doi: 10.1093/aje/kwt312. Epub 2014 Jan 12.
4
Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model.通过完全条件设定对协变量进行多重填补:适配实质性模型。
Stat Methods Med Res. 2015 Aug;24(4):462-87. doi: 10.1177/0962280214521348. Epub 2014 Feb 12.
5
Comparison of imputation methods for missing laboratory data in medicine.医学中缺失实验室数据的插补方法比较。
BMJ Open. 2013 Aug 1;3(8):e002847. doi: 10.1136/bmjopen-2013-002847.
6
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
7
Dealing with missing values in large-scale studies: microarray data imputation and beyond.处理大规模研究中的缺失值:微阵列数据插补及其他方法。
Brief Bioinform. 2010 Mar;11(2):253-64. doi: 10.1093/bib/bbp059. Epub 2009 Dec 4.
8
Multiple imputation of discrete and continuous data by fully conditional specification.通过完全条件设定对离散和连续数据进行多重填补
Stat Methods Med Res. 2007 Jun;16(3):219-42. doi: 10.1177/0962280206074463.
9
Missing value estimation methods for DNA microarrays.DNA微阵列的缺失值估计方法。
Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520.