流行病学中的结果建模策略：传统方法与基本替代方法

Outcome modelling strategies in epidemiology: traditional methods and basic alternatives.

作者信息

Greenland Sander, Daniel Rhian, Pearce Neil

机构信息

Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA.

Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.

出版信息

Int J Epidemiol. 2016 Apr;45(2):565-75. doi: 10.1093/ije/dyw040. Epub 2016 Apr 20.

DOI:10.1093/ije/dyw040

PMID:27097747

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4864881/

Abstract

Controlling for too many potential confounders can lead to or aggravate problems of data sparsity or multicollinearity, particularly when the number of covariates is large in relation to the study size. As a result, methods to reduce the number of modelled covariates are often deployed. We review several traditional modelling strategies, including stepwise regression and the 'change-in-estimate' (CIE) approach to deciding which potential confounders to include in an outcome-regression model for estimating effects of a targeted exposure. We discuss their shortcomings, and then provide some basic alternatives and refinements that do not require special macros or programming. Throughout, we assume the main goal is to derive the most accurate effect estimates obtainable from the data and commercial software. Allowing that most users must stay within standard software packages, this goal can be roughly approximated using basic methods to assess, and thereby minimize, mean squared error (MSE).

摘要

控制过多潜在混杂因素可能会导致或加剧数据稀疏或多重共线性问题，尤其是当协变量数量相对于研究规模较大时。因此，通常会采用减少建模协变量数量的方法。我们回顾了几种传统的建模策略，包括逐步回归和“估计值变化”（CIE）方法，以决定在估计目标暴露效应的结果回归模型中纳入哪些潜在混杂因素。我们讨论了它们的缺点，然后提供了一些基本的替代方法和改进方法，这些方法不需要特殊的宏或编程。在整个过程中，我们假设主要目标是从数据和商业软件中得出最准确的效应估计值。考虑到大多数用户必须使用标准软件包，这个目标可以通过使用基本方法来评估并从而最小化均方误差（MSE）大致实现。

相似文献

Outcome modelling strategies in epidemiology: traditional methods and basic alternatives.流行病学中的结果建模策略：传统方法与基本替代方法

Int J Epidemiol. 2016 Apr;45(2):565-75. doi: 10.1093/ije/dyw040. Epub 2016 Apr 20.

Statistical foundations for model-based adjustments.基于模型的调整的统计基础。

Annu Rev Public Health. 2015 Mar 18;36:89-108. doi: 10.1146/annurev-publhealth-031914-122559.

Causal Methods for Observational Research: A Primer.观察性研究的因果方法：入门指南。

Arch Iran Med. 2018 Apr 1;21(4):164-169.

Invited commentary: variable selection versus shrinkage in the control of multiple confounders.特邀评论：在控制多个混杂因素时的变量选择与收缩法

Am J Epidemiol. 2008 Mar 1;167(5):523-9; discussion 530-1. doi: 10.1093/aje/kwm355. Epub 2008 Jan 27.

Identification of confounder in epidemiologic data contaminated by measurement error in covariates.在协变量存在测量误差的情况下，对受污染的流行病学数据中混杂因素的识别。

BMC Med Res Methodol. 2016 May 18;16:54. doi: 10.1186/s12874-016-0159-6.

Propensity Score-Based Estimators With Multiple Error-Prone Covariates.基于倾向得分的多易错协变量估计量。

Am J Epidemiol. 2019 Jan 1;188(1):222-230. doi: 10.1093/aje/kwy210.

Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。

Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.

Estimating exposure effects by modelling the expectation of exposure conditional on confounders.通过对混杂因素条件下的暴露期望进行建模来估计暴露效应。

Biometrics. 1992 Jun;48(2):479-95.

A set of SAS macros for calculating and displaying adjusted odds ratios (with confidence intervals) for continuous covariates in logistic B-spline regression models.一组用于计算和显示逻辑B样条回归模型中连续协变量的调整比值比（及其置信区间）的SAS宏。

Comput Methods Programs Biomed. 2008 Oct;92(1):109-14. doi: 10.1016/j.cmpb.2008.05.004. Epub 2008 Jul 7.

Statistical methods for epidemiologic studies of the health effects of air pollution.空气污染对健康影响的流行病学研究中的统计方法。

Res Rep Health Eff Inst. 1999 May(86):1-50; discussion 51-6.

引用本文的文献

A blood-based DNA damage signature in patients with Parkinson's disease is associated with disease progression.帕金森病患者基于血液的DNA损伤特征与疾病进展相关。

Nat Aging. 2025 Sep 5. doi: 10.1038/s43587-025-00926-x.

Xietu Hemu Prescription Improves Metabolic Dysfunction-Associated Steatotic Liver Disease: A Real-World Cohort Study.泻土和木方改善代谢功能障碍相关脂肪性肝病：一项真实世界队列研究

J Multidiscip Healthc. 2025 Jul 29;18:4377-4389. doi: 10.2147/JMDH.S522519. eCollection 2025.

Comparing variable and feature selection strategies for prediction - protocol of a simulation study in low-dimensional transplantation data.比较用于预测的变量和特征选择策略——低维移植数据模拟研究方案

PLoS One. 2025 Aug 1;20(8):e0328696. doi: 10.1371/journal.pone.0328696. eCollection 2025.

Change in lifestyle and mental health in young adults: an exploratory study with hybrid machine learning.年轻人生活方式与心理健康的变化：一项基于混合机器学习的探索性研究

Front Public Health. 2025 Jun 4;13:1562280. doi: 10.3389/fpubh.2025.1562280. eCollection 2025.

Factors associated with symptom-to-surgery time in patients undergoing surgical repair for acute type A aortic dissection: an exploratory analysis from a prospective cohort study.急性A型主动脉夹层手术修复患者症状出现至手术时间的相关因素：一项前瞻性队列研究的探索性分析

BMJ Surg Interv Health Technol. 2025 May 28;7(1):e000304. doi: 10.1136/bmjsit-2024-000304. eCollection 2025.

Associations of parental labour migration and childhood maltreatment with psychosocial health among adolescents and young adults in China.中国青少年和青年中父母劳务移民及童年期虐待与心理社会健康的关联。

Eur J Psychotraumatol. 2025 Dec;16(1):2500139. doi: 10.1080/20008066.2025.2500139. Epub 2025 May 12.

Double burden of malnutrition among under-five children in Eastern and Southern African countries.东部和南部非洲国家五岁以下儿童的营养不良双重负担

Sci Rep. 2025 Apr 1;15(1):11042. doi: 10.1038/s41598-025-87144-y.

Self-reported diagnoses of dietary allergens and fecundability in a North American cohort.北美队列中自我报告的饮食过敏原诊断与生育能力

Hum Reprod. 2025 Mar 1;40(3):553-560. doi: 10.1093/humrep/deae277.

Pre-Existing and Gestational Diabetes and Risk of Maternal Venous Thromboembolism: A Systematic Review and Meta-Analysis of Observational Studies.孕前及妊娠期糖尿病与孕产妇静脉血栓栓塞风险：一项观察性研究的系统评价和荟萃分析

BJOG. 2025 Jul;132(8):1076-1085. doi: 10.1111/1471-0528.18043. Epub 2024 Dec 17.

Pre-Pregnancy Provegetarian Food Pattern and the Risk of Developing Gestational Diabetes Mellitus: The Seguimiento Universidad de Navarra (SUN) Cohort Study.孕前植物性食物模式与妊娠期糖尿病发病风险：纳瓦拉大学随访研究（SUN 队列研究）。

Medicina (Kaunas). 2024 Nov 16;60(11):1881. doi: 10.3390/medicina60111881.

本文引用的文献

Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions.逻辑回归及相关分类和生存回归中的惩罚、偏差减少和默认先验

Stat Med. 2015 Oct 15;34(23):3133-43. doi: 10.1002/sim.6537. Epub 2015 May 26.

Statistical foundations for model-based adjustments.基于模型的调整的统计基础。

Annu Rev Public Health. 2015 Mar 18;36:89-108. doi: 10.1146/annurev-publhealth-031914-122559.

On the definition of a confounder.关于混杂因素的定义。

Ann Stat. 2013 Feb;41(1):196-220. doi: 10.1214/12-aos1058.

Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros.中介分析允许暴露-中介相互作用和因果解释：理论假设和使用 SAS 和 SPSS 宏的实现。

Psychol Methods. 2013 Jun;18(2):137-50. doi: 10.1037/a0031034. Epub 2013 Feb 4.

Bayesian regression in SAS software.贝叶斯回归在 SAS 软件中的应用。

Int J Epidemiol. 2013 Feb;42(1):308-17. doi: 10.1093/ije/dys213. Epub 2012 Dec 10.

Bayesian effect estimation accounting for adjustment uncertainty.考虑调整不确定性的贝叶斯效应估计。

Biometrics. 2012 Sep;68(3):661-71. doi: 10.1111/j.1541-0420.2011.01731.x. Epub 2012 Feb 24.

Effects of adjusting for instrumental variables on bias and precision of effect estimates.调整工具变量对效应估计偏差和精度的影响。

Am J Epidemiol. 2011 Dec 1;174(11):1213-22. doi: 10.1093/aje/kwr364. Epub 2011 Oct 24.

On model selection and model misspecification in causal inference.在因果推断中的模型选择和模型误设定。

Stat Methods Med Res. 2012 Feb;21(1):7-30. doi: 10.1177/0962280210387717. Epub 2010 Nov 12.

Illustrating bias due to conditioning on a collider.图示由于在共因上进行条件推断而产生的偏差。

Int J Epidemiol. 2010 Apr;39(2):417-20. doi: 10.1093/ije/dyp334. Epub 2009 Nov 19.

Exhaustion, automation, theory, and confounding.疲惫、自动化、理论与混杂因素。

Epidemiology. 2009 Jul;20(4):523-4. doi: 10.1097/EDE.0b013e3181a82501.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验