Liang Liang, Carroll Raymond, Ma Yanyuan
Department of Biostatistics, Harvard University, Boston, MA 02115, USA,
Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843, USA, and School of Mathematical and, Physical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia,
Electron J Stat. 2018;12(1):1782-1821. doi: 10.1214/18-EJS1446. Epub 2018 Jun 12.
Studying the relationship between covariates based on retrospective data is the main purpose of secondary analysis, an area of increasing interest. We examine the secondary analysis problem when multiple covariates are available, while only a regression mean model is specified. Despite the completely parametric modeling of the regression mean function, the case-control nature of the data requires special treatment and semi-parametric efficient estimation generates various nonparametric estimation problems with multivariate covariates. We devise a dimension reduction approach that fits with the specified primary and secondary models in the original problem setting, and use reweighting to adjust for the case-control nature of the data, even when the disease rate in the source population is unknown. The resulting estimator is both locally efficient and robust against the misspecification of the regression error distribution, which can be heteroscedastic as well as non-Gaussian. We demonstrate the advantage of our method over several existing methods, both analytically and numerically.
基于回顾性数据研究协变量之间的关系是二次分析的主要目的,这是一个越来越受关注的领域。当有多个协变量可用时,我们研究二次分析问题,而仅指定了回归均值模型。尽管回归均值函数完全是参数化建模,但数据的病例对照性质需要特殊处理,半参数有效估计会产生各种多变量协变量的非参数估计问题。我们设计了一种降维方法,该方法在原始问题设置中与指定的主要和次要模型相匹配,并使用重新加权来调整数据的病例对照性质,即使源人群中的疾病率未知。所得估计量在局部是有效的,并且对回归误差分布的错误设定具有鲁棒性,回归误差分布可以是异方差的以及非高斯的。我们通过分析和数值方法证明了我们的方法相对于几种现有方法的优势。