将实验影响估计推广到目标人群的评估方法。

Assessing methods for generalizing experimental impact estimates to target populations.

作者信息

Kern Holger L, Stuart Elizabeth A, Hill Jennifer, Green Donald P

机构信息

Department of Political Science, Florida State University.

Departments of Mental Health, Biostatistics, and Health, Policy, and Management, Bloomberg School of Public Health, Johns Hopkins University.

出版信息

J Res Educ Eff. 2016;9(1):103-127. doi: 10.1080/19345747.2015.1060282. Epub 2016 Jan 14.

DOI:10.1080/19345747.2015.1060282

PMID:27668031

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5030077/

Abstract

Randomized experiments are considered the gold standard for causal inference, as they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments, as the experimental participants may be unrepresentative of the target population of interest. This paper examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods' performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program. Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multi-site experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind. Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.

摘要

随机实验被视为因果推断的黄金标准，因为它们能够为实验参与者提供无偏的治疗效果估计。然而，研究人员和政策制定者常常希望利用特定实验为有关其他目标人群的决策提供依据。在教育研究中，随机实验的可推广性可能存在的不足日益受到关注，因为实验参与者可能无法代表感兴趣的目标人群。本文探讨是否可以通过统计方法来辅助推广，这些方法用于调整实验参与者与目标人群成员之间观察到的差异。所考察的方法包括对实验数据重新加权以使参与者更接近目标人群的方法，以及利用结果模型的方法。两项模拟研究和一项实证分析对这些方法的性能进行了调查和比较。一项模拟使用纯模拟数据，另一项则利用一项基于学校的辍学预防项目评估中的数据。我们的模拟表明，当所需的结构（可忽略性）假设得到满足时，机器学习方法优于基于回归的方法。当这些假设被违反时，所考察的所有方法表现都很差。我们的实证分析使用来自多地点实验的数据，以评估给定地点的结果对其他地点影响的预测效果如何。使用各种外推方法，将每个地点的预测效果与实际基准进行比较。灵活的建模方法表现最佳，尽管线性回归也相差不远。综合来看，这些结果表明灵活的建模技术有助于推广，同时也强调了即使是最先进的统计技术仍然依赖于强假设这一事实。

相似文献

Assessing methods for generalizing experimental impact estimates to target populations.

J Res Educ Eff. 2016;9(1):103-127. doi: 10.1080/19345747.2015.1060282. Epub 2016 Jan 14.

Comparing the performance of statistical methods that generalize effect estimates from randomized controlled trials to much larger target populations.

Commun Stat Simul Comput. 2022;51(8):4326-4348. doi: 10.1080/03610918.2020.1741621. Epub 2020 Mar 18.

Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.

Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.

The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol.

JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919.

Implementing statistical methods for generalizing randomized trial findings to a target population.

Addict Behav. 2019 Jul;94:124-132. doi: 10.1016/j.addbeh.2018.10.033. Epub 2018 Oct 25.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

Characteristics of School Districts That Participate in Rigorous National Educational Evaluations.

J Res Educ Eff. 2017;10(1):168-206. doi: 10.1080/19345747.2016.1205160. Epub 2016 Oct 3.

Generalizing randomized trial findings to a target population using complex survey population data.

Stat Med. 2021 Feb 28;40(5):1101-1120. doi: 10.1002/sim.8822. Epub 2020 Nov 26.

Generalizing experimental results by leveraging knowledge of mechanisms.

Eur J Epidemiol. 2021 Feb;36(2):149-164. doi: 10.1007/s10654-020-00687-4. Epub 2020 Oct 18.

Effect heterogeneity and variable selection for standardizing causal effects to a target population.

Eur J Epidemiol. 2019 Dec;34(12):1119-1129. doi: 10.1007/s10654-019-00571-w. Epub 2019 Oct 26.

引用本文的文献

: a kernel balancing approach for reducing specification assumptions in survey weighting.

J R Stat Soc Ser A Stat Soc. 2025 Jul;188(3):875-895. doi: 10.1093/jrsssa/qnae082. Epub 2024 Sep 2.

Improving efficiency in transporting average treatment effects.

Biometrika. 2025;112(3). doi: 10.1093/biomet/asaf027. Epub 2025 Apr 8.

Use of transportability methods for real-world evidence generation: a review of current applications.

J Comp Eff Res. 2024 Nov;13(11):e240064. doi: 10.57264/cer-2024-0064. Epub 2024 Oct 4.

Generalizability of Randomized Clinical Trial Outcomes for Diabetes Control Resulting From Bariatric Surgery.

Ann Surg Open. 2024 Apr 10;5(2):e414. doi: 10.1097/AS9.0000000000000414. eCollection 2024 Jun.

Recent Developments in Causal Inference and Machine Learning.

Annu Rev Sociol. 2023 Jul;49:81-110. doi: 10.1146/annurev-soc-030420-015345. Epub 2023 Apr 26.

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

J Res Educ Eff. 2024;17(1):184-210. doi: 10.1080/19345747.2023.2180464. Epub 2023 Apr 13.

MRSamePopTest: introducing a simple falsification test for the two-sample mendelian randomisation 'same population' assumption.

BMC Res Notes. 2024 Jan 17;17(1):27. doi: 10.1186/s13104-024-06684-0.

A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources.

Proc Mach Learn Res. 2022 Jul;162:21013-21036.

Predicting Treatment Effects of a New-to-Market Drug in Clinical Practice Based on Phase III Randomized Trial Results.

Clin Pharmacol Ther. 2023 Oct;114(4):853-861. doi: 10.1002/cpt.2983. Epub 2023 Jul 14.

Calibrated meta-analysis to estimate the efficacy of mental health treatments in target populations: an application to paliperidone trials for treatment of schizophrenia.

BMC Med Res Methodol. 2023 Jun 26;23(1):150. doi: 10.1186/s12874-023-01958-w.

本文引用的文献

External Validity in Policy Evaluations that Choose Sites Purposively.

J Policy Anal Manage. 2013 Winter;32(1):107-121. doi: 10.1002/pam.21660.

The use of propensity scores to assess the generalizability of results from randomized trials.

J R Stat Soc Ser A Stat Soc. 2001 Apr 1;174(2):369-386. doi: 10.1111/j.1467-985X.2010.00673.x.

Weight trimming and propensity score weighting.

PLoS One. 2011 Mar 31;6(3):e18174. doi: 10.1371/journal.pone.0018174.

Matching methods for causal inference: A review and a look forward.

Stat Sci. 2010 Feb 1;25(1):1-21. doi: 10.1214/09-STS313.

Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research.

Psychol Methods. 2010 Sep;15(3):234-49. doi: 10.1037/a0019623.

Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial.

Am J Epidemiol. 2010 Jul 1;172(1):107-15. doi: 10.1093/aje/kwq084. Epub 2010 Jun 14.

Improving propensity score weighting using machine learning.

Stat Med. 2010 Feb 10;29(3):337-46. doi: 10.1002/sim.3782.

Adjustment for selection bias in observational studies with application to the analysis of autopsy data.

Neuroepidemiology. 2009;32(3):229-39. doi: 10.1159/000197389. Epub 2009 Jan 29.

Evaluating bias correction in weighted proportional hazards regression.

Lifetime Data Anal. 2009 Mar;15(1):120-46. doi: 10.1007/s10985-008-9102-4. Epub 2008 Oct 29.

Early intervention in low-birth-weight premature infants. Results through age 5 years from the Infant Health and Development Program.

JAMA. 1994 Oct 26;272(16):1257-62.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将实验影响估计推广到目标人群的评估方法。

Assessing methods for generalizing experimental impact estimates to target populations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献