Xu Zhichao, Li Chunlin, Chi Sunyi, Yang Tianzhong, Wei Peng
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, U.S.A.
Department of Statistics, Iowa State University, Ames, Iowa, 50011, U.S.A.
bioRxiv. 2024 Sep 21:2023.02.06.527391. doi: 10.1101/2023.02.06.527391.
Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.
中介分析是研究基因表达等分子表型如何介导暴露对健康结果影响的有用工具。然而,在存在高维组学中介变量的情况下,常用的基于均值的总中介效应度量可能会因分量中介效应在相反方向上的抵消而受到影响。为克服这一局限性,我们最近提出了一种基于方差的R平方总中介效应度量,该度量依赖于计算量较大的非参数自助法进行置信区间估计。在本文所述的工作中,我们为该度量制定了一种更有效的两阶段交叉拟合估计程序。为避免潜在偏差,我们在两个子样本中进行迭代确定性独立筛选(iSIS)以排除非中介变量,然后进行普通最小二乘回归以进行方差估计。然后,我们基于新推导的该度量的闭式渐近分布构建置信区间。广泛的模拟研究表明,所提出的程序在计算效率上比基于重采样的方法高得多,且具有相当的覆盖概率。此外,当应用于弗雷明汉心脏研究时,所提出的方法重现了基因表达介导收缩压年龄相关变化的既定发现,并确定了基因表达谱在性别与高密度脂蛋白胆固醇水平关系中的作用。所提出的估计程序在R包CFR2M中实现。