逻辑斯蒂回归、几何平均回归、泊松回归和负二项回归模型的优势比。

Odds ratios from logistic, geometric, Poisson, and negative binomial regression models.

机构信息

Department of Economics, Applied Statistics, and International Business, New Mexico State University, MSC 3CQ, PO Box 30001, Las Cruces, NM, 88003-8001, USA.

Division of Biostatistics, The Ohio State University, 1841 Neil Avenue, Columbus, OH, 43210-1240, USA.

出版信息

BMC Med Res Methodol. 2018 Oct 20;18(1):112. doi: 10.1186/s12874-018-0568-9.

DOI:10.1186/s12874-018-0568-9

PMID:30342488

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6195979/

Abstract

BACKGROUND

The odds ratio (OR) is used as an important metric of comparison of two or more groups in many biomedical applications when the data measure the presence or absence of an event or represent the frequency of its occurrence. In the latter case, researchers often dichotomize the count data into binary form and apply the well-known logistic regression technique to estimate the OR. In the process of dichotomizing the data, however, information is lost about the underlying counts which can reduce the precision of inferences on the OR.

METHODS

We propose analyzing the count data directly using regression models with the log odds link function. With this approach, the parameter estimates in the model have the exact same interpretation as in a logistic regression of the dichotomized data, yielding comparable estimates of the OR. We prove analytically, using the Fisher information matrix, that our approach produces more precise estimates of the OR than logistic regression of the dichotomized data. We also show the gains in precision using simulation studies and real-world datasets. We focus on three related distributions for count data: geometric, Poisson, and negative binomial.

RESULTS

In simulation studies, confidence intervals for the OR were 56-65% as wide (geometric model), 75-79% as wide (Poisson model), and 61-69% as wide (negative binomial model) as the corresponding interval from a logistic regression produced by dichotomizing the data. When we analyzed existing datasets using our approach, we found that confidence intervals for the OR could be up to 64% shorter (36% as wide) compared to if the data had been dichotomized and analyzed using logistic regression.

CONCLUSIONS

More precise estimates of the OR can be obtained directly from the count data by using the log odds link function. This analytic approach is easy to implement in software packages that are capable of fitting generalized linear models or of maximizing user-defined likelihood functions.

摘要

背景

在许多生物医学应用中，当数据衡量事件的存在或不存在或表示其发生的频率时，比值比 (OR) 被用作比较两个或多个组的重要指标。在后一种情况下，研究人员通常将计数数据二分为二进制形式，并应用著名的逻辑回归技术来估计 OR。然而，在对数据进行二分处理的过程中，关于潜在计数的信息会丢失，这会降低 OR 推断的精度。

方法

我们建议使用具有对数几率链接函数的回归模型直接分析计数数据。通过这种方法，模型中的参数估计与二分类数据的逻辑回归中的估计具有完全相同的解释，从而得出可比的 OR 估计值。我们使用 Fisher 信息矩阵从理论上证明，我们的方法比二分类数据的逻辑回归产生更精确的 OR 估计值。我们还通过模拟研究和真实数据集展示了精度的提高。我们专注于计数数据的三个相关分布：几何分布、泊松分布和负二项分布。

结果

在模拟研究中，OR 的置信区间（几何模型）为 56-65%，（泊松模型）为 75-79%，（负二项分布模型）为 61-69%，与通过对数据进行二分类并使用逻辑回归产生的相应区间一样宽。当我们使用我们的方法分析现有数据集时，我们发现 OR 的置信区间可以比通过对数据进行二分类并使用逻辑回归分析缩短 64%（宽 36%）。