逻辑回归:简介。
Logistic regression: a brief primer.
机构信息
Research Institute, St. Luke's Hospital and Health Network, Bethlehem, PA, USA.
出版信息
Acad Emerg Med. 2011 Oct;18(10):1099-104. doi: 10.1111/j.1553-2712.2011.01185.x.
Regression techniques are versatile in their application to medical research because they can measure associations, predict outcomes, and control for confounding variable effects. As one such technique, logistic regression is an efficient and powerful way to analyze the effect of a group of independent variables on a binary outcome by quantifying each independent variable's unique contribution. Using components of linear regression reflected in the logit scale, logistic regression iteratively identifies the strongest linear combination of variables with the greatest probability of detecting the observed outcome. Important considerations when conducting logistic regression include selecting independent variables, ensuring that relevant assumptions are met, and choosing an appropriate model building strategy. For independent variable selection, one should be guided by such factors as accepted theory, previous empirical investigations, clinical considerations, and univariate statistical analyses, with acknowledgement of potential confounding variables that should be accounted for. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable to avoid an overfit model, with commonly recommended minimum "rules of thumb" ranging from 10 to 20 events per covariate. Regarding model building strategies, the three general types are direct/standard, sequential/hierarchical, and stepwise/statistical, with each having a different emphasis and purpose. Before reaching definitive conclusions from the results of any of these methods, one should formally quantify the model's internal validity (i.e., replicability within the same data set) and external validity (i.e., generalizability beyond the current sample). The resulting logistic regression model's overall fit to the sample data is assessed using various goodness-of-fit measures, with better fit characterized by a smaller difference between observed and model-predicted values. Use of diagnostic statistics is also recommended to further assess the adequacy of the model. Finally, results for independent variables are typically reported as odds ratios (ORs) with 95% confidence intervals (CIs).
回归技术在医学研究中应用广泛,因为它们可以测量关联、预测结果,并控制混杂变量的影响。作为一种这样的技术,逻辑回归是一种通过量化每个独立变量的独特贡献来分析一组独立变量对二分类结果影响的有效且强大的方法。逻辑回归利用线性回归的对数尺度中的组成部分,迭代地识别具有最大观测结果检测概率的最强线性组合变量。进行逻辑回归时需要考虑的重要因素包括选择独立变量、确保满足相关假设以及选择适当的模型构建策略。在选择独立变量时,应考虑到公认的理论、以前的实证研究、临床考虑因素和单变量统计分析等因素,同时也要注意可能需要考虑的混杂变量。逻辑回归必须满足的基本假设包括误差独立性、连续变量的对数线性、无多重共线性以及不存在强烈影响的异常值。此外,每个独立变量都应该有足够数量的事件,以避免过度拟合模型,通常推荐的最小“经验法则”范围为每个协变量 10 到 20 个事件。关于模型构建策略,有三种一般类型:直接/标准、顺序/分层和逐步/统计,每种类型都有不同的重点和目的。在从任何这些方法的结果得出明确结论之前,应该正式量化模型的内部有效性(即在同一数据集内的可复制性)和外部有效性(即超出当前样本的可推广性)。使用各种拟合优度指标评估逻辑回归模型对样本数据的总体拟合程度,更好的拟合特征是观察值和模型预测值之间的差异较小。还建议使用诊断统计来进一步评估模型的充分性。最后,独立变量的结果通常以优势比(OR)和 95%置信区间(CI)报告。