Tang Lu, Song Peter X-K
Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania.
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.
Biometrics. 2021 Sep;77(3):914-928. doi: 10.1111/biom.13333. Epub 2020 Jul 28.
Stratification is a very commonly used approach in biomedical studies to handle sample heterogeneity arising from, for examples, clinical units, patient subgroups, or missing-data. A key rationale behind such approach is to overcome potential sampling biases in statistical inference. Two issues of such stratification-based strategy are (i) whether individual strata are sufficiently distinctive to warrant stratification, and (ii) sample size attrition resulted from the stratification may potentially lead to loss of statistical power. To address these issues, we propose a penalized generalized estimating equations approach to reducing the complexity of parametric model structures due to excessive stratification. Specifically, we develop a data-driven fusion learning approach for longitudinal data that improves estimation efficiency by integrating information across similar strata, yet still allows necessary separation for stratum-specific conclusions. The proposed method is evaluated by simulation studies and applied to a motivating example of psychiatric study to demonstrate its usefulness in real world settings.
分层是生物医学研究中一种非常常用的方法,用于处理例如临床单位、患者亚组或缺失数据等引起的样本异质性。这种方法背后的一个关键基本原理是克服统计推断中的潜在抽样偏差。基于分层的策略存在两个问题:(i)各个分层是否足够独特以保证分层的合理性;(ii)分层导致的样本量损耗可能会潜在地导致统计效力的损失。为了解决这些问题,我们提出一种惩罚广义估计方程方法,以降低由于过度分层导致的参数模型结构的复杂性。具体而言,我们为纵向数据开发了一种数据驱动的融合学习方法,该方法通过整合相似分层的信息来提高估计效率,但仍允许进行必要的区分以得出特定分层的结论。所提出的方法通过模拟研究进行评估,并应用于一个精神病学研究的实例,以证明其在实际应用中的有效性。