• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CVtreeMLE:在R语言中使用数据自适应决策树和交叉验证的靶向最大似然估计对混合暴露进行有效估计。

CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R.

作者信息

McCoy David, Hubbard Alan, Van der Laan Mark

机构信息

Division of Environmental Health Sciences, University of California, Berkeley, CA, United States of America.

Department of Biostatistics, University of California, Berkeley, CA, United States of America.

出版信息

J Open Source Softw. 2023;8(82). doi: 10.21105/joss.04181. Epub 2023 Feb 21.

DOI:10.21105/joss.04181
PMID:37398941
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10312067/
Abstract

Statistical causal inference of mixed exposures has been limited by reliance on parametric models and, until recently, by researchers considering only one exposure at a time, usually estimated as a beta coefficient in a generalized linear regression model (GLM). This independent assessment of exposures poorly estimates the joint impact of a collection of the same exposures in a realistic exposure setting. Marginal methods for mixture variable selection such as ridge/lasso regression are biased by linear assumptions and the interactions modeled are chosen by the user. Clustering methods such as principal component regression lose both interpretability and valid inference. Newer mixture methods such as quantile g-computation (Keil et al., 2020) are biased by linear/additive assumptions. More flexible methods such as Bayesian kernel machine regression (BKMR)(Bobb et al., 2014) are sensitive to the choice of tuning parameters, are computationally taxing and lack an interpretable and robust summary statistic of dose-response relationships. No methods currently exist which finds the best flexible model to adjust for covariates while applying a non-parametric model that targets for interactions in a mixture and delivers valid inference for a target parameter. Non-parametric methods such as decision trees are a useful tool to evaluate combined exposures by finding partitions in the joint-exposure (mixture) space that best explain the variance in an outcome. However, current methods using decision trees to assess statistical inference for interactions are biased and are prone to overfitting by using the full data to both identify nodes in the tree and make statistical inference given these nodes. Other methods have used an independent test set to derive inference which does not use the full data. The CVtreeMLE R package provides researchers in (bio)statistics, epidemiology, and environmental health sciences with access to state-of-the-art statistical methodology for evaluating the causal effects of a data-adaptively determined mixed exposure using decision trees. Our target audience are those analysts who would normally use a potentially biased GLM based model for a mixed exposure. Instead, we hope to provide users with a non-parametric statistical machine where users simply specify the exposures, covariates and outcome, CVtreeMLE then determines if a best fitting decision tree exists and delivers interpretable results.

摘要

混合暴露的统计因果推断一直受到依赖参数模型的限制,并且直到最近,还受到研究人员一次仅考虑一种暴露的限制,通常在广义线性回归模型(GLM)中估计为β系数。这种对暴露的独立评估很难估计在实际暴露环境中同一组暴露的联合影响。诸如岭回归/套索回归等混合变量选择的边际方法受到线性假设的偏差,并且用户选择所建模的相互作用。诸如主成分回归等聚类方法既失去了可解释性,又失去了有效推断。诸如分位数g计算(Keil等人,2020)等较新的混合方法受到线性/加性假设的偏差。诸如贝叶斯核机器回归(BKMR)(Bobb等人,2014)等更灵活的方法对调优参数的选择敏感,计算量大,并且缺乏剂量反应关系的可解释和稳健的汇总统计量。目前不存在这样的方法,即在应用针对混合物中的相互作用的非参数模型并为目标参数提供有效推断的同时,找到用于调整协变量的最佳灵活模型。诸如决策树等非参数方法是通过在联合暴露(混合物)空间中找到最能解释结果方差的分区来评估组合暴露的有用工具。然而,当前使用决策树评估相互作用统计推断的方法存在偏差,并且由于使用完整数据来识别树中的节点并基于这些节点进行统计推断而容易过度拟合。其他方法使用独立测试集来得出不使用完整数据的推断。CVtreeMLE R包为(生物)统计学、流行病学和环境卫生科学领域的研究人员提供了使用决策树评估数据自适应确定的混合暴露因果效应的最新统计方法。我们的目标受众是那些通常会使用基于潜在偏差的GLM模型进行混合暴露的分析师。相反,我们希望为用户提供一种非参数统计机器,用户只需指定暴露、协变量和结果,CVtreeMLE然后确定是否存在最佳拟合决策树并提供可解释的结果。

相似文献

1
CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R.CVtreeMLE:在R语言中使用数据自适应决策树和交叉验证的靶向最大似然估计对混合暴露进行有效估计。
J Open Source Softw. 2023;8(82). doi: 10.21105/joss.04181. Epub 2023 Feb 21.
2
Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法
Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression.贝叶斯核机器回归分析多种并发暴露对健康影响的统计软件。
Environ Health. 2018 Aug 20;17(1):67. doi: 10.1186/s12940-018-0413-y.
5
Semiparametric Estimation of the Impacts of Longitudinal Interventions on Adolescent Obesity using Targeted Maximum-Likelihood: Accessible Estimation with the ltmle Package.使用靶向最大似然法对纵向干预对青少年肥胖影响的半参数估计:使用ltmle软件包进行可及估计
J Causal Inference. 2014 Mar;2(1):95-108. doi: 10.1515/jci-2013-0025.
6
SuperNOVA: Semi-Parametric Identification and Estimation of Interaction and Effect Modification in Mixed Exposures using Stochastic Interventions in R.超新星:利用R语言中的随机干预对混合暴露中的交互作用和效应修正进行半参数识别与估计
J Open Source Softw. 2023;8(91). doi: 10.21105/joss.05422. Epub 2023 Nov 5.
7
Joint mixed-effects models for causal inference with longitudinal data.具有纵向数据的因果推理的联合混合效应模型。
Stat Med. 2018 Feb 28;37(5):829-846. doi: 10.1002/sim.7567. Epub 2017 Dec 4.
8
Bayesian semi-parametric G-computation for causal inference in a cohort study with MNAR dropout and death.用于具有非随机缺失和死亡的队列研究中因果推断的贝叶斯半参数G计算法
J R Stat Soc Ser C Appl Stat. 2021 Mar;70(2):398-414. doi: 10.1111/rssc.12464. Epub 2021 Jan 6.
9
Performance of variable and function selection methods for estimating the nonlinear health effects of correlated chemical mixtures: A simulation study.用于估计相关化学混合物非线性健康影响的变量和函数选择方法的性能:一项模拟研究。
Stat Med. 2020 Nov 30;39(27):3947-3967. doi: 10.1002/sim.8701. Epub 2020 Sep 17.
10
Semi-Parametric Estimation and Inference for the Mean Outcome of the Single Time-Point Intervention in a Causally Connected Population.因果关联总体中单一时间点干预平均结果的半参数估计与推断
J Causal Inference. 2017 Mar;5(1). doi: 10.1515/jci-2016-0003. Epub 2016 Nov 29.

本文引用的文献

1
A review of practical statistical methods used in epidemiological studies to estimate the health effects of multi-pollutant mixture.多污染物混合对健康影响的流行病学研究中实用统计方法综述。
Environ Pollut. 2022 Aug 1;306:119356. doi: 10.1016/j.envpol.2022.119356. Epub 2022 Apr 27.
2
A Quantile-Based g-Computation Approach to Addressing the Effects of Exposure Mixtures.基于分位数的 g 计算方法在解决暴露混合物影响中的应用。
Environ Health Perspect. 2020 Apr;128(4):47004. doi: 10.1289/EHP5838. Epub 2020 Apr 7.
3
Recursive partitioning for heterogeneous causal effects.异质因果效应的递归划分
Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7353-60. doi: 10.1073/pnas.1510489113.
4
Statistical Inference for Data Adaptive Target Parameters.数据自适应目标参数的统计推断
Int J Biostat. 2016 May 1;12(1):3-19. doi: 10.1515/ijb-2015-0013.
5
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.用于估计多污染物混合物健康影响的贝叶斯核机器回归
Biostatistics. 2015 Jul;16(3):493-508. doi: 10.1093/biostatistics/kxu058. Epub 2014 Dec 22.
6
Unraveling the health effects of environmental mixtures: an NIEHS priority.解析环境混合物对健康的影响:美国国立环境卫生科学研究所的一项优先任务。
Environ Health Perspect. 2013 Jan;121(1):A6-8. doi: 10.1289/ehp.1206182.