使用聚类数据和少量阳性结果检验逻辑回归系数

Testing logistic regression coefficients with clustered data and few positive outcomes.

作者信息

Hunsberger Sally, Graubard Barry I, Korn Edward L

机构信息

Biometric Research Branch, National Cancer Institute, Bethesda, MD 20892, U.S.A.

出版信息

Stat Med. 2008 Apr 15;27(8):1305-24. doi: 10.1002/sim.3011.

DOI:10.1002/sim.3011

PMID:17705348

Abstract

Applications frequently involve logistic regression analysis with clustered data where there are few positive outcomes in some of the independent variable categories. For example, an application is given here that analyzes the association of asthma with various demographic variables and risk factors using data from the third National Health and Nutrition Examination Survey, a weighted multi stage cluster sample. Although there are 742 asthma cases in all (out of 18,395 individuals), for one of the categories of one of the independent variables there are only 25 asthma cases (out of 695 individuals). Generalized Wald and score hypothesis tests, which use appropriate cluster-level variance estimators, and a bootstrap hypothesis test have been proposed for testing logistic regression coefficients with cluster samples. When there are few positive outcomes, simulations presented in this paper show that these tests can sometimes have either inflated or very conservative levels. A simulation-based method is proposed for testing logistic regression coefficients with cluster samples when there are few positive outcomes. This testing methodology is shown to compare favorably with the generalized Wald and score tests and the bootstrap hypothesis test in terms of maintaining nominal levels. The proposed method is also useful when testing goodness-of-fit of logistic regression models using deciles-of-risk tables.

摘要

应用场景常常涉及对聚类数据进行逻辑回归分析，其中某些自变量类别中的阳性结果较少。例如，这里给出一个应用案例，它使用第三次全国健康和营养检查调查（一项加权多阶段聚类样本）的数据，分析哮喘与各种人口统计学变量及风险因素之间的关联。尽管总共18395名个体中有742例哮喘病例，但在其中一个自变量的某一类别中，695名个体里只有25例哮喘病例。已有人提出使用适当的聚类水平方差估计量的广义Wald检验和得分假设检验，以及一种自助假设检验，用于对聚类样本的逻辑回归系数进行检验。当阳性结果较少时，本文给出的模拟结果表明，这些检验有时可能会出现检验水平膨胀或非常保守的情况。本文提出了一种基于模拟的方法，用于在阳性结果较少时对聚类样本的逻辑回归系数进行检验。在保持名义水平方面，这种检验方法与广义Wald检验、得分检验以及自助假设检验相比具有优势。当使用风险十分位数表检验逻辑回归模型的拟合优度时，所提出的方法也很有用。