Suppr超能文献

稀疏数据集下逻辑回归方法中优势比的偏倚。

Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets.

机构信息

Department of Biostatistics, Faculty of Medicine, University of Tsukuba.

Graduate School of Comprehensive Human Sciences, University of Tsukuba.

出版信息

J Epidemiol. 2023 Jun 5;33(6):265-275. doi: 10.2188/jea.JE20210089. Epub 2022 Apr 1.

Abstract

BACKGROUND

Logistic regression models are widely used to evaluate the association between a binary outcome and a set of covariates. However, when there are few study participants at the outcome and covariate levels, the models lead to bias of the odds ratio (OR) estimated using the maximum likelihood (ML) method. This bias is known as sparse data bias, and the estimated OR can yield impossibly large values because of data sparsity. However, this bias has been ignored in most epidemiological studies.

METHODS

We review several methods for reducing sparse data bias in logistic regression. The primary aim is to evaluate the Bayesian methods in comparison with the classical methods, such as the ML, Firth's, and exact methods using a simulation study. We also apply these methods to a real data set.

RESULTS

Our simulation results indicate that the bias of the OR from the ML, Firth's, and exact methods is considerable. Furthermore, the Bayesian methods with hyper-ɡ prior modeling of the prior covariance matrix for regression coefficients reduced the bias under the null hypothesis, whereas the Bayesian methods with log F-type priors reduced the bias under the alternative hypothesis.

CONCLUSION

The Bayesian methods using log F-type priors and hyper-ɡ prior are superior to the ML, Firth's, and exact methods when fitting logistic models to sparse data sets. The choice of a preferable method depends on the null and alternative hypothesis. Sensitivity analysis is important to understand the robustness of the results in sparse data analysis.

摘要

背景

逻辑回归模型被广泛用于评估二项结局与一组协变量之间的关联。然而,当结局和协变量水平的研究参与者较少时,模型会导致使用最大似然(ML)方法估计的优势比(OR)产生偏差。这种偏差称为稀疏数据偏差,由于数据稀疏性,估计的 OR 可能会产生不可能的大值。然而,这种偏差在大多数流行病学研究中被忽略了。

方法

我们综述了几种用于减少逻辑回归中稀疏数据偏差的方法。主要目的是通过模拟研究评估贝叶斯方法与经典方法(如 ML、Firth 和精确方法)的比较。我们还将这些方法应用于真实数据集。

结果

我们的模拟结果表明,ML、Firth 和精确方法的 OR 偏差相当大。此外,对于回归系数的先验协方差矩阵使用超 g 先验建模的贝叶斯方法在零假设下减少了偏差,而在备择假设下使用对数 F 型先验的贝叶斯方法减少了偏差。

结论

当将逻辑回归模型拟合到稀疏数据集时,使用对数 F 型先验和超 g 先验的贝叶斯方法优于 ML、Firth 和精确方法。选择更优的方法取决于零假设和备择假设。敏感性分析对于理解稀疏数据分析结果的稳健性很重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b94a/10165217/42de3b017054/je-33-265-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验