Suppr超能文献

利用定量偏差分析解决随机森林中的测量误差问题。

Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis.

出版信息

Am J Epidemiol. 2021 Sep 1;190(9):1830-1840. doi: 10.1093/aje/kwab010.

Abstract

Although variables are often measured with error, the impact of measurement error on machine-learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on the performance of random-forest models and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random-forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the National Comorbidity Survey Replication (2001-2003). Second, we created simulated data sets in which we knew the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the data sets. Our findings showed that measurement error in the data used to construct random forests can distort model performance and variable importance measures and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.

摘要

尽管变量通常会带有误差进行测量,但测量误差对机器学习预测的影响很少被量化。本研究的目的是评估测量误差对随机森林模型性能和变量重要性的影响。首先,我们使用来自国家共病调查再调查(2001-2003 年)的数据评估了预测变量的分类错误(即分类变量的测量误差)对随机森林模型性能(例如准确性、敏感性)和变量重要性(准确性平均下降)的影响。其次,我们创建了模拟数据集,我们知道真实的模型性能和变量重要性度量,可以验证定量偏差分析在数据集的分类版本中是否能恢复真实情况。我们的发现表明,用于构建随机森林的数据中的测量误差会扭曲模型性能和变量重要性度量,而偏差分析可以恢复正确的结果。本研究强调了在机器学习中应用定量偏差分析来量化测量误差对研究结果的影响的实用性。

相似文献

9
MEBoost: Variable selection in the presence of measurement error.MEBoost:存在测量误差时的变量选择。
Stat Med. 2019 Jul 10;38(15):2705-2718. doi: 10.1002/sim.8130. Epub 2019 Mar 11.

引用本文的文献

2
Evaluating Binary Outcome Classifiers Estimated from Survey Data.评估基于调查数据估计的二项分类器。
Epidemiology. 2024 Nov 1;35(6):805-812. doi: 10.1097/EDE.0000000000001776. Epub 2024 Aug 14.
3
Predictive models of miscarriage on the basis of data from a preconception cohort study.基于孕前队列研究数据的流产预测模型。
Fertil Steril. 2024 Jul;122(1):140-149. doi: 10.1016/j.fertnstert.2024.04.007. Epub 2024 Apr 10.
6
Deep Survival Analysis With Clinical Variables for COVID-19.深度生存分析与 COVID-19 的临床变量。
IEEE J Transl Eng Health Med. 2023 Mar 14;11:223-231. doi: 10.1109/JTEHM.2023.3256966. eCollection 2023.
10
Detection of child depression using machine learning methods.使用机器学习方法检测儿童抑郁症。
PLoS One. 2021 Dec 16;16(12):e0261131. doi: 10.1371/journal.pone.0261131. eCollection 2021.

本文引用的文献

10
The parameter sensitivity of random forests.随机森林的参数敏感性。
BMC Bioinformatics. 2016 Sep 1;17(1):331. doi: 10.1186/s12859-016-1228-x.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验