存在非重叠情况时估计总体平均因果效应：天然气压缩机站暴露对癌症死亡率的影响。

ESTIMATING POPULATION AVERAGE CAUSAL EFFECTS IN THE PRESENCE OF NON-OVERLAP: THE EFFECT OF NATURAL GAS COMPRESSOR STATION EXPOSURE ON CANCER MORTALITY.

作者信息

Nethery Rachel C, Mealli Fabrizia, Dominici Francesca

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

Department of Statistics, Informatics, Applications, University of Florence, Florence, Italy.

出版信息

Ann Appl Stat. 2019 Jun;13(2):1242-1267. doi: 10.1214/18-AOAS1231. Epub 2019 Jun 17.

DOI:10.1214/18-AOAS1231

PMID:31346355

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6658123/

Abstract

Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal effects. When data suffer from non-overlap, estimation of these estimands requires reliance on model specifications, due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health research settings, where study results are often intended to influence policy, population-level inference may be critical, and changes in the estimand can diminish the impact of the study results, because estimates may not be representative of effects in the population of interest to policymakers. Researchers may be willing to make additional, minimal modeling assumptions in order to preserve the ability to estimate population average causal effects. We seek to make two contributions on this topic. First, we propose a flexible, data-driven definition of propensity score overlap and non-overlap regions. Second, we develop a novel Bayesian framework to estimate population average causal effects with minor model dependence and appropriately large uncertainties in the presence of non-overlap and causal effect heterogeneity. In this approach, the tasks of estimating causal effects in the overlap and non-overlap regions are delegated to two distinct models, suited to the degree of data support in each region. Tree ensembles are used to non-parametrically estimate individual causal effects in the overlap region, where the data can speak for themselves. In the non-overlap region, where insufficient data support means reliance on model specification is necessary, individual causal effects are estimated by extrapolating trends from the overlap region via a spline model. The promising performance of our method is demonstrated in simulations. Finally, we utilize our method to perform a novel investigation of the causal effect of natural gas compressor station exposure on cancer outcomes. Code and data to implement the method and reproduce all simulations and analyses is available on Github (https://github.com/rachelnethery/overlap).

摘要

大多数因果推断研究依赖重叠假设来估计总体或样本平均因果效应。当数据存在非重叠情况时，由于数据支持不足，对这些估计量的估计需要依赖模型设定。所有现有的解决非重叠问题的方法，如在数据支持不足的区域修剪或降低数据权重，都会改变估计量，从而无法对样本或潜在总体进行推断。在环境卫生研究中，研究结果往往旨在影响政策，总体层面的推断可能至关重要，而估计量的变化会削弱研究结果的影响，因为估计可能无法代表政策制定者感兴趣的总体中的效应。研究人员可能愿意做出额外的、最小化的建模假设，以保留估计总体平均因果效应的能力。我们试图在这个主题上做出两点贡献。首先，我们提出了一种灵活的、数据驱动的倾向得分重叠和非重叠区域的定义。其次，我们开发了一种新颖的贝叶斯框架，以在存在非重叠和因果效应异质性的情况下，以较小的模型依赖性和适当大的不确定性来估计总体平均因果效应。在这种方法中，在重叠和非重叠区域估计因果效应的任务被委托给两个不同的模型，这两个模型适合每个区域的数据支持程度。树集成用于非参数估计重叠区域的个体因果效应，在该区域数据可以自行说明情况。在非重叠区域，由于数据支持不足意味着必须依赖模型设定，个体因果效应通过样条模型从重叠区域外推趋势来估计。我们方法在模拟中展示了良好的性能。最后，我们利用我们的方法对天然气压缩站暴露对癌症结局的因果效应进行了新颖的研究。实现该方法并重现所有模拟和分析的代码和数据可在Github上获取（https://github.com/rachelnethery/overlap）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1325/6658123/de65894baa22/nihms-1012621-f0001.jpg

相似文献

ESTIMATING POPULATION AVERAGE CAUSAL EFFECTS IN THE PRESENCE OF NON-OVERLAP: THE EFFECT OF NATURAL GAS COMPRESSOR STATION EXPOSURE ON CANCER MORTALITY.存在非重叠情况时估计总体平均因果效应：天然气压缩机站暴露对癌症死亡率的影响。

Ann Appl Stat. 2019 Jun;13(2):1242-1267. doi: 10.1214/18-AOAS1231. Epub 2019 Jun 17.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Estimation of causal effects of multiple treatments in observational studies with a binary outcome.二元结局观察性研究中多种治疗因果效应的估计。

Stat Methods Med Res. 2020 Nov;29(11):3218-3234. doi: 10.1177/0962280220921909. Epub 2020 May 25.

Mortality and Morbidity Effects of Long-Term Exposure to Low-Level PM, BC, NO, and O: An Analysis of European Cohorts in the ELAPSE Project.长期暴露于低水平 PM、BC、NO 和 O 对死亡率和发病率的影响：ELAPSE 项目中欧洲队列的分析。

Res Rep Health Eff Inst. 2021 Sep;2021(208):1-127.

Causal Inference Methods for Estimating Long-Term Health Effects of Air Quality Regulations.用于评估空气质量法规长期健康影响的因果推断方法。

Res Rep Health Eff Inst. 2016 May(187):5-49.

Assessing Adverse Health Effects of Long-Term Exposure to Low Levels of Ambient Air Pollution: Implementation of Causal Inference Methods.评估长期暴露于低水平环境空气污染对健康的不良影响：因果推理方法的实施。

Res Rep Health Eff Inst. 2022 Jan;2022(211):1-56.

Adjustment for energy intake in nutritional research: a causal inference perspective.营养研究中能量摄入的调整：因果推理视角。

Am J Clin Nutr. 2022 Jan 11;115(1):189-198. doi: 10.1093/ajcn/nqab266.

Propensity score weighting methods for causal subgroup analysis with time-to-event outcomes.用于具有事件发生时间结局的因果亚组分析的倾向评分加权方法。

Stat Methods Med Res. 2023 Oct;32(10):1919-1935. doi: 10.1177/09622802231188517. Epub 2023 Aug 9.

Multiple imputation procedures for estimating causal effects with multiple treatments with application to the comparison of healthcare providers.多重插补程序用于估计多处理下的因果效应，并应用于医疗保健提供者的比较。

Stat Med. 2022 Jan 15;41(1):208-226. doi: 10.1002/sim.9231. Epub 2021 Nov 2.

Clarifying selection bias in cluster randomized trials.阐明整群随机试验中的选择偏倚。

Clin Trials. 2022 Feb;19(1):33-41. doi: 10.1177/17407745211056875. Epub 2021 Dec 11.

引用本文的文献

Synthesis estimators for transportability with positivity violations by a continuous covariate.用于处理连续协变量违反正性假设的可移植性的合成估计量。

J R Stat Soc Ser A Stat Soc. 2024 Sep 2;188(1):158-180. doi: 10.1093/jrsssa/qnae084. eCollection 2025 Jan.

Personalized statin treatment plan using counterfactual approach with multi-objective optimization over benefits and risks.使用反事实方法并对益处和风险进行多目标优化的个性化他汀类药物治疗方案。

Inform Med Unlocked. 2023;42. doi: 10.1016/j.imu.2023.101362. Epub 2023 Oct 2.

It's electric! An environmental equity perspective on the lifecycle of our energy sources.这是电力驱动的！从环境公平视角看我们能源的生命周期。

Environ Epidemiol. 2023 Apr 3;7(2):e246. doi: 10.1097/EE9.0000000000000246. eCollection 2023 Apr.

The Minderoo-Monaco Commission on Plastics and Human Health.美诺集团-摩纳哥基金会塑料与人体健康委员会

Ann Glob Health. 2023 Mar 21;89(1):23. doi: 10.5334/aogh.4056. eCollection 2023.

Core concepts in pharmacoepidemiology: Violations of the positivity assumption in the causal analysis of observational data: Consequences and statistical approaches.药物流行病学的核心概念：观察性数据分析中因果关系分析中阳性假设的违背：后果和统计方法。

Pharmacoepidemiol Drug Saf. 2021 Nov;30(11):1471-1485. doi: 10.1002/pds.5338. Epub 2021 Aug 24.

Borrowing from supplemental sources to estimate causal effects from a primary data source.从补充资料中借用数据来估计原始资料的因果效应。

Stat Med. 2021 Oct 30;40(24):5115-5130. doi: 10.1002/sim.9114. Epub 2021 Jun 22.

A practical introduction to Bayesian estimation of causal effects: Parametric and nonparametric approaches.因果效应的贝叶斯估计实用介绍：参数化和非参数化方法

Stat Med. 2021 Jan 30;40(2):518-551. doi: 10.1002/sim.8761. Epub 2020 Oct 5.

In Pursuit of Evidence in Air Pollution Epidemiology: The Role of Causally Driven Data Science.追求空气污染流行病学中的证据：因果驱动数据科学的作用。

Epidemiology. 2020 Jan;31(1):1-6. doi: 10.1097/EDE.0000000000001090.

本文引用的文献

Addressing Extreme Propensity Scores via the Overlap Weights.通过重叠权重解决极端倾向评分。

Am J Epidemiol. 2019 Jan 1;188(1):250-257. doi: 10.1093/aje/kwy201.

Association of Short-term Exposure to Air Pollution With Mortality in Older Adults.老年人短期暴露于空气污染与死亡率的关联。

JAMA. 2017 Dec 26;318(24):2446-2456. doi: 10.1001/jama.2017.17923.

Air Pollution and Mortality in the Medicare Population.医疗保险人群中的空气污染与死亡率

N Engl J Med. 2017 Jun 29;376(26):2513-2522. doi: 10.1056/NEJMoa1702747.

Childhood hematologic cancer and residential proximity to oil and gas development.儿童血液系统癌症与居住地靠近油气开发地区的关系

PLoS One. 2017 Feb 15;12(2):e0170423. doi: 10.1371/journal.pone.0170423. eCollection 2017.

Trends and Patterns of Disparities in Cancer Mortality Among US Counties, 1980-2014.1980 - 2014年美国各县癌症死亡率差异的趋势与模式

JAMA. 2017 Jan 24;317(4):388-406. doi: 10.1001/jama.2016.20324.

Shale gas development and cancer incidence in southwest Pennsylvania.宾夕法尼亚西南部的页岩气开发与癌症发病率

Public Health. 2016 Dec;141:198-206. doi: 10.1016/j.puhe.2016.09.008. Epub 2016 Oct 26.

Multinomial probit Bayesian additive regression trees.多项概率单位贝叶斯加法回归树

Stat (Int Stat Inst). 2016;5(1):119-131. doi: 10.1002/sta4.110. Epub 2016 Apr 4.

Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).使用贝叶斯加法回归树（BART）进行非参数生存分析。

Stat Med. 2016 Jul 20;35(16):2741-53. doi: 10.1002/sim.6893. Epub 2016 Feb 7.

Endocrine-Disrupting Chemicals and Oil and Natural Gas Operations: Potential Environmental Contamination and Recommendations to Assess Complex Environmental Mixtures.内分泌干扰化学物质与石油和天然气作业：潜在的环境污染及评估复杂环境混合物的建议

Environ Health Perspect. 2016 Mar;124(3):256-64. doi: 10.1289/ehp.1409535. Epub 2015 Aug 27.

Estimation of causal effects of binary treatments in unconfounded studies.在非混杂研究中估计二分类处理的因果效应。

Stat Med. 2015 Nov 20;34(26):3381-98. doi: 10.1002/sim.6532. Epub 2015 May 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验