Department of Epidemiology & Biostatistics, University of California at San Francisco, San Francisco, California, USA.
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, USA.
Stat Med. 2021 Oct 15;40(23):4915-4930. doi: 10.1002/sim.9101. Epub 2021 Jun 16.
Synthesizing external aggregated information has been proven useful in improving estimation efficiency when conducting statistical analysis using a limited amount of data. In this paper, we develop a unified framework for combining information from high-dimensional individual-level data and potentially low-dimensional external aggregate data under the Cox model. We summarize various forms of external aggregated information by population estimating equations and propose a penalized empirical likelihood approach to borrow information from these estimating equations. The proposed methods possess the flexibility to handle the case where individual-level data and external aggregate data are from heterogeneous populations. Specifically, a penalized empirical likelihood ratio test is developed to check for the potential heterogeneity, and a semiparametric density ratio model is postulated to account for the heterogeneity. Moreover, we study the impact of uncertainty in the auxiliary information on the efficiency gain and propose a modified variance estimator to adjust for the uncertainty. The proposed estimators enjoy the oracle property and are asymptotically more efficient than the penalized partial likelihood estimator that does not exploit the external aggregated information. Simulation studies show improvement in both estimation efficiency and variable selection over the competitors. The proposed approaches are applied to the analysis of a pediatric kidney transplant study for illustration.
综合外部聚合信息已被证明在使用有限数量的数据进行统计分析时有助于提高估计效率。在本文中,我们开发了一个统一的框架,用于在 Cox 模型下结合来自高维个体水平数据和潜在低维外部聚合数据的信息。我们通过人群估计方程总结了各种形式的外部聚合信息,并提出了一种惩罚经验似然方法来从这些估计方程中借用信息。所提出的方法具有灵活性,可以处理个体水平数据和外部聚合数据来自异质群体的情况。具体来说,开发了一个惩罚经验似然比检验来检查潜在的异质性,并假设了一个半参数密度比模型来解释异质性。此外,我们研究了辅助信息不确定性对效率增益的影响,并提出了一种修正方差估计量来调整不确定性。所提出的估计量具有 Oracle 属性,并且在渐近意义上比不利用外部聚合信息的惩罚部分似然估计量更有效。模拟研究表明,与竞争对手相比,这些方法在估计效率和变量选择方面都有所改进。所提出的方法应用于儿科肾移植研究的分析,以说明问题。