Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, 13001 E. 17th Pl, Aurora, CO, USA.
School of Medicine, University of Colorado-Denver Anschutz Medical Campus, Aurora, CO, USA.
BMC Med Res Methodol. 2022 May 21;22(1):148. doi: 10.1186/s12874-022-01613-w.
Missing data prove troublesome in data analysis; at best they reduce a study's statistical power and at worst they induce bias in parameter estimates. Multiple imputation via chained equations is a popular technique for dealing with missing data. However, techniques for combining and pooling results from fitted generalized additive models (GAMs) after multiple imputation have not been well explored.
We simulated missing data under MCAR, MAR, and MNAR frameworks and utilized random forest and predictive mean matching imputation to investigate a variety of rules for combining GAMs after multiple imputation with binary and normally distributed outcomes. We compared multiple pooling procedures including the "D2" method, the Cauchy combination test, and the median p-value (MPV) rule. The MPV rule involves simply computing and reporting the median p-value across all imputations. Other ad hoc methods such as a mean p-value rule and a single imputation method are investigated. The viability of these methods in pooling results from B-splines is also examined for normal outcomes. An application of these various pooling techniques is then performed on two case studies, one which examines the effect of elevation on a six-minute walk distance (a normal outcome) for patients with pulmonary arterial hypertension, and the other which examines risk factors for intubation in hospitalized COVID-19 patients (a dichotomous outcome).
In comparison to the results from generalized additive models fit on full datasets, the median p-value rule performs as well as if not better than the other methods examined. In situations where the alternative hypothesis is true, the Cauchy combination test appears overpowered and alternative methods appear underpowered, while the median p-value rule yields results similar to those from analyses of complete data.
For pooling results after fitting GAMs to multiply imputed datasets, the median p-value is a simple yet useful approach which balances both power to detect important associations and control of Type I errors.
缺失数据在数据分析中是一个棘手的问题;最好的情况下,它们会降低研究的统计效力,最坏的情况下,它们会导致参数估计产生偏差。通过链式方程进行多重插补是处理缺失数据的一种常用技术。然而,在进行多重插补后结合和汇总拟合广义加性模型(GAMs)结果的技术尚未得到充分探索。
我们在完全数据集上拟合 GAMs 后,使用中位数 p 值规则对多重插补后 GAMs 结果进行汇总,比较了不同的二元和正态分布结局的汇总方法,包括“D2”方法、柯西组合检验和中位数 p 值(MPV)规则。MPV 规则涉及到简单地计算和报告所有插补值的中位数 p 值。此外,还研究了一些特定方法,如平均 p 值规则和单一插补方法。然后,还检查了这些方法在正态分布结局的 B 样条中汇总结果的可行性。最后,我们将这些各种汇总技术应用于两个案例研究,一个研究考察了肺动脉高压患者的海拔高度对六分钟步行距离(正态分布结局)的影响,另一个研究考察了住院 COVID-19 患者插管的危险因素(二分类结局)。
与在完整数据集上拟合 GAMs 的结果相比,中位数 p 值规则的表现与其他方法相当,甚至更好。在替代假设为真的情况下,柯西组合检验似乎具有优势,而替代方法则显得效力不足,而中位数 p 值规则的结果与分析完整数据的结果相似。
对于拟合 GAMs 后对多重插补数据集结果进行汇总,中位数 p 值是一种简单而有用的方法,它可以平衡检测重要关联的效力和控制 I 型错误的能力。