Derkach Andriy, Pfeiffer Ruth M, Chen Ting-Huei, Sampson Joshua N
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland.
Department of Mathematics and Statistics, Laval University, Quebec City, Canada.
Biometrics. 2019 Sep;75(3):745-756. doi: 10.1111/biom.13053. Epub 2019 May 5.
We propose a model for high dimensional mediation analysis that includes latent variables. We describe our model in the context of an epidemiologic study for incident breast cancer with one exposure and a large number of biomarkers (i.e., potential mediators). We assume that the exposure directly influences a group of latent, or unmeasured, factors which are associated with both the outcome and a subset of the biomarkers. The biomarkers associated with the latent factors linking the exposure to the outcome are considered "mediators." We derive the likelihood for this model and develop an expectation-maximization algorithm to maximize an L1-penalized version of this likelihood to limit the number of factors and associated biomarkers. We show that the resulting estimates are consistent and that the estimates of the nonzero parameters have an asymptotically normal distribution. In simulations, procedures based on this new model can have significantly higher power for detecting the mediating biomarkers compared with the simpler approaches. We apply our method to a study that evaluates the relationship between body mass index, 481 metabolic measurements, and estrogen-receptor positive breast cancer.
我们提出了一种用于高维中介分析的模型,该模型包含潜在变量。我们在一项针对新发乳腺癌的流行病学研究背景下描述我们的模型,该研究涉及一种暴露因素和大量生物标志物(即潜在中介变量)。我们假设该暴露因素直接影响一组与结局和一部分生物标志物都相关的潜在或未测量因素。与将暴露因素与结局联系起来的潜在因素相关的生物标志物被视为“中介变量”。我们推导了该模型的似然函数,并开发了一种期望最大化算法,以最大化该似然函数的L1惩罚版本,从而限制因素及相关生物标志物的数量。我们表明,所得估计值是一致的,并且非零参数的估计值具有渐近正态分布。在模拟中,与更简单的方法相比,基于这种新模型的程序在检测中介生物标志物方面具有显著更高的功效。我们将我们的方法应用于一项评估体重指数、481项代谢测量指标与雌激素受体阳性乳腺癌之间关系的研究。