Suppr超能文献

利用机器学习比较复杂混杂情况下的疾病风险评分和倾向评分:一项模拟研究。

Use of Machine Learning to Compare Disease Risk Scores and Propensity Scores Across Complex Confounding Scenarios: A Simulation Study.

作者信息

Guo Yuchen, Strauss Victoria Y, Khalid Sara, Prieto-Alhambra Daniel

机构信息

Centre for Statistics in Medicine, University of Oxford, Oxford, UK.

Boehringer-Ingelheim Pharma GmbH & co., KG, Germany.

出版信息

Pharmacoepidemiol Drug Saf. 2025 Jun;34(6):e70165. doi: 10.1002/pds.70165.

Abstract

PURPOSE

The surge of treatments for COVID-19 in the second quarter of 2020 had a low prevalence of treatment and high outcome risk. Motivated by that, we conducted a simulation study comparing disease risk scores (DRS) and propensity scores (PS) using a range of scenarios with different treatment prevalences and outcome risks.

METHOD

Four methods were used to estimate PS and DRS: logistic regression (reference method), least absolute shrinkage and selection operator (LASSO), multilayer perceptron (MLP), and XgBoost. Monte Carlo simulations generated data across 25 scenarios varying in treatment prevalence, outcome risk, data complexity, and sample size. Average treatment effects were calculated after matching. Relative bias and average absolute standardized mean difference (ASMD) were reported.

RESULT

Estimation bias increased as treatment prevalence decreased. DRS showed lower bias than PS when treatment prevalence was below 0.1, especially in nonlinear data. However, DRS did not outperform PS in linear or small sample data. PS had comparable or lower bias than DRS when treatment prevalence was 0.1-0.5. Three machine learning (ML) methods performed similarly, with LASSO and XgBoost outperforming the reference method in some nonlinear scenarios. ASMD results indicated that DRS was less impacted by decreasing treatment prevalence compared to PS.

CONCLUSION

Under nonlinear data, DRS reduced bias compared to PS in scenarios with low treatment prevalence, while PS was preferable for data with treatment prevalence greater than 0.1, regardless of the outcome risk. ML methods can outperform the logistic regression method for PS and DRS estimation. Both decreasing sample size and adding nonlinearity and nonadditivity in data increased bias for all methods tested.

摘要

目的

2020年第二季度新冠病毒病(COVID-19)治疗方法激增,治疗普及率较低且结果风险较高。受此启发,我们进行了一项模拟研究,在一系列具有不同治疗普及率和结果风险的场景中比较疾病风险评分(DRS)和倾向评分(PS)。

方法

使用四种方法估计PS和DRS:逻辑回归(参考方法)、最小绝对收缩和选择算子(LASSO)、多层感知器(MLP)和Xgboost。蒙特卡洛模拟生成了25种场景的数据,这些场景在治疗普及率、结果风险、数据复杂性和样本量方面各不相同。匹配后计算平均治疗效果。报告了相对偏差和平均绝对标准化均值差(ASMD)。

结果

随着治疗普及率降低,估计偏差增加。当治疗普及率低于0.1时,DRS的偏差低于PS,尤其是在非线性数据中。然而,在线性或小样本数据中,DRS并不优于PS。当治疗普及率为0.1-0.5时,PS的偏差与DRS相当或更低。三种机器学习(ML)方法表现相似,在某些非线性场景中,LASSO和Xgboost优于参考方法。ASMD结果表明,与PS相比,DRS受治疗普及率下降的影响较小。

结论

在非线性数据下,在治疗普及率较低的场景中,与PS相比,DRS可降低偏差,而对于治疗普及率大于0.1的数据,无论结果风险如何,PS更可取。ML方法在PS和DRS估计方面可优于逻辑回归方法。样本量减小以及数据中增加非线性和非加性都会增加所有测试方法的偏差。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验