Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, USA.
Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, USA.
BMC Bioinformatics. 2021 Aug 23;22(1):414. doi: 10.1186/s12859-021-04322-1.
Environmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes. It is of significant scientific interest to unravel the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposure and traits. Mediation analysis is an important tool for investigating such relationships. However, it has mainly focused on low-dimensional settings, and there is a lack of a good measure of the total mediation effect. Here, we extend an R-squared (R[Formula: see text]) effect size measure, originally proposed in the single-mediator setting, to the moderate- and high-dimensional mediator settings in the mixed model framework.
Based on extensive simulations, we compare our measure and estimation procedure with several frequently used mediation measures, including product, proportion, and ratio measures. Our R[Formula: see text]-based second-moment measure has small bias and variance under the correctly specified model. To mitigate potential bias induced by non-mediators, we examine two variable selection procedures, i.e., iterative sure independence screening and false discovery rate control, to exclude the non-mediators. We establish the consistency of the proposed estimation procedures and introduce a resampling-based confidence interval. By applying the proposed estimation procedure, we found that 38% of the age-related variations in systolic blood pressure can be explained by gene expression profiles in the Framingham Heart Study of 1711 individuals. An R package "RsqMed" is available on CRAN.
R-squared (R[Formula: see text]) is an effective and efficient measure for total mediation effect especially under high-dimensional setting.
环境暴露可以通过不同的机制调节中间分子表型,如基因表达,从而导致各种健康结果。揭示环境暴露与特征之间潜在高维中间表型的作用具有重要的科学意义。中介分析是研究这种关系的重要工具。然而,它主要集中在低维设置,缺乏对总中介效应的良好衡量标准。在这里,我们将 R 平方(R[公式:见文本])效应量度量扩展到混合模型框架中的中等和高维中介设置,该度量最初是在单中介设置中提出的。
基于广泛的模拟,我们将我们的度量和估计程序与几种常用的中介度量进行了比较,包括乘积、比例和比率度量。我们基于 R[公式:见文本]的二阶矩度量在正确指定的模型下具有较小的偏差和方差。为了减轻非中介物引起的潜在偏差,我们检查了两种变量选择程序,即迭代稳健独立性筛选和错误发现率控制,以排除非中介物。我们确立了所提出的估计程序的一致性,并引入了基于重抽样的置信区间。通过应用所提出的估计程序,我们发现 1711 名弗雷明汉心脏研究个体的基因表达谱可以解释收缩压随年龄变化的 38%。一个基于 R 的软件包“RsqMed”可在 CRAN 上获得。
R 平方(R[公式:见文本])是一种有效的、高效的总中介效应度量方法,特别是在高维设置下。