Han Peisong, Taylor Jeremy M G, Mukherjee Bhramar
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Can J Stat. 2023 Jun;51(2):355-374. doi: 10.1002/cjs.11701. Epub 2022 Apr 15.
Consider the setting where (i) individual-level data are collected to build a regression model for the association between an event of interest and certain covariates, and (ii) some risk calculators predicting the risk of the event using less detailed covariates are available, possibly as algorithmic black boxes with little information available about how they were built. We propose a general empirical-likelihood-based framework to integrate the rich auxiliary information contained in the calculators into fitting the regression model, to make the estimation of regression parameters more efficient. Two methods are developed, one using working models to extract the calculator information and one making a direct use of calculator predictions without working models. Theoretical and numerical investigations show that the calculator information can substantially reduce the variance of regression parameter estimation. As an application, we study the dependence of the risk of high grade prostate cancer on both conventional risk factors and newly identified molecular biomarkers by integrating information from the Prostate Biopsy Collaborative Group (PBCG) risk calculator, which was built based on conventional risk factors alone.
(i)收集个体层面的数据以建立一个回归模型,用于研究感兴趣的事件与某些协变量之间的关联;(ii)有一些风险计算器,它们使用不太详细的协变量来预测事件风险,这些计算器可能是算法黑箱,关于其构建方式的信息很少。我们提出了一个基于经验似然的通用框架,将计算器中包含的丰富辅助信息整合到回归模型的拟合中,以使回归参数的估计更有效。我们开发了两种方法,一种使用工作模型来提取计算器信息,另一种直接使用计算器预测而不使用工作模型。理论和数值研究表明,计算器信息可以显著降低回归参数估计的方差。作为一个应用,我们通过整合前列腺活检协作组(PBCG)风险计算器的信息,研究高级别前列腺癌风险对传统风险因素和新发现的分子生物标志物的依赖性,该计算器仅基于传统风险因素构建。