Suppr超能文献

能否通过机器学习方法来超越高维倾向评分算法?

Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

出版信息

Epidemiology. 2018 Mar;29(2):191-198. doi: 10.1097/EDE.0000000000000787.

Abstract

The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

摘要

使用回顾性医疗保健索赔数据集经常受到缺乏潜在混杂因素完整信息的批评。利用索赔数据集中与患者健康状况相关的信息作为对测量和未观察到的混杂因素的替代或代理,高维倾向评分算法使我们能够减少偏差。使用先前发表的心肌梗死后他汀类药物使用的队列研究(1998-2012 年),我们比较了该算法与几种流行的机器学习方法在高维协变量空间中的混杂因素选择性能:随机森林、最小绝对收缩和选择算子、弹性网络。我们的结果表明,当数据分析遵循流行病学原理时,机器学习方法的性能与高维倾向评分算法一样好。使用模拟经验数据的 plasmode 框架,我们还表明,在使用基于偏差的分析时,机器学习和高维倾向评分算法的混合通常在均方误差方面表现略好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验