Rosenblum Michael, van der Laan Mark J
Center for AIDS Prevention Studies, University of California, San Francisco, California 94105, USA.
Biometrics. 2009 Sep;65(3):937-45. doi: 10.1111/j.1541-0420.2008.01177.x. Epub 2009 Feb 4.
Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, because commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this article, we focus on hypothesis tests of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such hypothesis tests have correct type I error for large samples. Our main result is that for a surprisingly large class of commonly used regression models, standard regression-based hypothesis tests (but using robust variance estimators) are guaranteed to have correct type I error for large samples, even when the models are incorrectly specified. To the best of our knowledge, this robustness of such model-based hypothesis tests to incorrectly specified models was previously unknown for Poisson regression models and for other commonly used models we consider. Our results have practical implications for understanding the reliability of commonly used, model-based tests for analyzing randomized trials.
回归模型常用于从随机试验或实验收集的数据中检验因果关系。这种做法理所当然地受到了严格审查,因为常用的模型(如线性回归和逻辑回归)往往无法捕捉变量之间的实际关系,而错误设定的模型可能会导致错误的结论。在本文中,我们关注在年龄、性别和健康状况等基线变量分层内,随机试验中给予的治疗是否对主要结局的均值有任何影响的假设检验。我们主要关心的是确保此类假设检验对于大样本具有正确的I型错误率。我们的主要结果是,对于一大类出人意料的常用回归模型,基于标准回归的假设检验(但使用稳健方差估计量)即使在模型错误设定的情况下,也能保证对于大样本具有正确的I型错误率。据我们所知,对于泊松回归模型以及我们考虑的其他常用模型,此类基于模型的假设检验对错误设定模型的这种稳健性以前并不为人所知。我们的结果对于理解用于分析随机试验的常用基于模型的检验的可靠性具有实际意义。