Greenland S, Schwartzbaum J A, Finkle W D
Department of Epidemiology, School of Public Health, University of California at Los Angeles, USA.
Am J Epidemiol. 2000 Mar 1;151(5):531-9. doi: 10.1093/oxfordjournals.aje.a010240.
Conditional logistic regression was developed to avoid "sparse-data" biases that can arise in ordinary logistic regression analysis. Nonetheless, it is a large-sample method that can exhibit considerable bias when certain types of matched sets are infrequent or when the model contains too many parameters. Sparse-data bias can cause misleading inferences about confounding, effect modification, dose response, and induction periods, and can interact with other biases. In this paper, the authors describe these problems in the context of matched case-control analysis and provide examples from a study of electrical wiring and childhood leukemia and a study of diet and glioma. The same problems can arise in any likelihood-based analysis, including ordinary logistic regression. The problems can be detected by careful inspection of data and by examining the sensitivity of estimates to category boundaries, variables in the model, and transformations of those variables. One can also apply various bias corrections or turn to methods less sensitive to sparse data than conditional likelihood, such as Bayesian and empirical-Bayes (hierarchical regression) methods.
条件逻辑回归的发展是为了避免普通逻辑回归分析中可能出现的“稀疏数据”偏差。尽管如此,它是一种大样本方法,当某些类型的匹配集不常见或模型包含过多参数时,可能会表现出相当大的偏差。稀疏数据偏差可能会导致关于混杂、效应修正、剂量反应和诱导期的误导性推断,并且可能与其他偏差相互作用。在本文中,作者在匹配病例对照分析的背景下描述了这些问题,并提供了一项关于电线与儿童白血病研究以及一项关于饮食与胶质瘤研究的示例。同样的问题也可能出现在任何基于似然性的分析中,包括普通逻辑回归。这些问题可以通过仔细检查数据以及检查估计值对类别边界、模型中的变量以及这些变量的变换的敏感性来检测。人们还可以应用各种偏差校正方法,或者转向比条件似然性对稀疏数据不太敏感的方法,如贝叶斯方法和经验贝叶斯(分层回归)方法。