Suppr超能文献

在hdPS框架中进行偏差和方差估计时,相较于布罗斯公式,使用多元统计或机器学习方法是否具有竞争优势?

Is there a competitive advantage to using multivariate statistical or machine learning methods over the Bross formula in the hdPS framework for bias and variance estimation?

作者信息

Ehsanul Karim Mohammad, Lei Yang

机构信息

School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada.

Centre for Advancing Health Outcomes, St. Paul's Hospital, Vancouver, British Columbia, Canada.

出版信息

PLoS One. 2025 May 28;20(5):e0324639. doi: 10.1371/journal.pone.0324639. eCollection 2025.

Abstract

PURPOSE

We aim to evaluate various proxy selection methods within the context of high-dimensional propensity score (hdPS) analysis. This study aimed to systematically evaluate and compare the performance of traditional statistical methods and machine learning approaches within the hdPS framework, focusing on key metrics such as bias, standard error (SE), and coverage, under various exposure and outcome prevalence scenarios.

METHODS

We conducted a plasmode simulation study using data from the National Health and Nutrition Examination Survey (NHANES) cycles from 2013 to 2018. We compared methods including the kitchen sink model, Bross-based hdPS, Hybrid hdPS, LASSO, Elastic Net, Random Forest, XGBoost, and Genetic Algorithm (GA). The performance of each inverse probability weighted method was assessed based on bias, MSE, coverage probability, and SE estimation across three epidemiological scenarios: frequent exposure and outcome, rare exposure and frequent outcome, and frequent exposure and rare outcome.

RESULTS

XGBoost consistently demonstrated strong performance in terms of MSE and coverage, making it effective for scenarios prioritizing precision. However, it exhibited higher bias, particularly in rare exposure scenarios, suggesting it is less suited when minimizing bias is critical. In contrast, GA showed significant limitations, with consistently high bias and MSE, making it the least reliable method. Bross-based hdPS, and Hybrid hdPS methods provided a balanced approach, with low bias and moderate MSE, though coverage varied depending on the scenario. Rare outcome scenarios generally resulted in lower MSE and better precision, while rare exposure scenarios were associated with higher bias and MSE. Notably, traditional statistical approaches such as forward selection and backward elimination performed comparably to more sophisticated machine learning methods in terms of bias and coverage, suggesting that these simpler approaches may be viable alternatives due to their computational efficiency.

CONCLUSION

The results highlight the importance of selecting hdPS methods based on the specific characteristics of the data, such as exposure and outcome prevalence. While advanced machine learning methods such as XGBoost can enhance precision, simpler methods such as forward selection or backward elimination may offer similar performance in terms of bias and coverage with fewer computational demands. Tailoring the choice of method to the epidemiological scenario is essential for optimizing the balance between bias reduction and precision.

摘要

目的

我们旨在评估高维倾向评分(hdPS)分析背景下的各种代理选择方法。本研究旨在系统评估和比较hdPS框架内传统统计方法和机器学习方法的性能,重点关注不同暴露和结局患病率情况下的偏差、标准误差(SE)和覆盖率等关键指标。

方法

我们使用2013年至2018年国家健康与营养检查调查(NHANES)周期的数据进行了模拟研究。我们比较了包括全变量模型、基于布罗斯的hdPS、混合hdPS、套索回归、弹性网络、随机森林、极端梯度提升(XGBoost)和遗传算法(GA)在内的方法。基于偏差、均方误差(MSE)、覆盖概率和SE估计,在三种流行病学情况下评估了每种逆概率加权方法的性能:频繁暴露和结局、罕见暴露和频繁结局以及频繁暴露和罕见结局。

结果

XGBoost在MSE和覆盖率方面始终表现出强大的性能,使其在优先考虑精度的情况下有效。然而,它表现出较高的偏差,特别是在罕见暴露情况下,这表明在最小化偏差至关重要时它不太适用。相比之下,GA显示出显著局限性,偏差和MSE始终很高,使其成为最不可靠的方法。基于布罗斯的hdPS和混合hdPS方法提供了一种平衡的方法,偏差低且MSE适中,尽管覆盖率因情况而异。罕见结局情况通常导致较低的MSE和更好的精度,而罕见暴露情况与较高的偏差和MSE相关。值得注意的是,在偏差和覆盖率方面,正向选择和反向淘汰等传统统计方法与更复杂的机器学习方法表现相当,这表明这些更简单的方法由于其计算效率可能是可行的替代方法。

结论

结果强调了根据数据特定特征(如暴露和结局患病率)选择hdPS方法的重要性。虽然像XGBoost这样的先进机器学习方法可以提高精度,但像正向选择或反向淘汰这样更简单的方法在偏差和覆盖率方面可能提供类似的性能,且计算需求较少。根据流行病学情况调整方法选择对于优化偏差减少和精度之间的平衡至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1720/12118903/bb975bce5674/pone.0324639.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验