Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Rd, 200240 Shanghai, China.
Department of Developmental and Behavioral Pediatrics, Pediatric Translational Medicine Institute, National Children's Medical Center, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, 1678 Dongfang Rd, 200127 Shanghai, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae496.
Mediation analysis has been widely utilized to identify potential pathways connecting exposures and outcomes. However, there remains a lack of analytical methods for high-dimensional mediation analysis in longitudinal data. To tackle this concern, we proposed an effective and novel approach with variable selection and the indirect effect (IE) assessment based on both linear mixed-effect model and generalized estimating equation. Initially, we employ sure independence screening to reduce the dimension of candidate mediators. Subsequently, we implement the Sobel test with the Bonferroni correction for IE hypothesis testing. Through extensive simulation studies, we demonstrate the performance of our proposed procedure with a higher F$_{1}$ score (0.8056 and 0.9983 at sample sizes of 150 and 500, respectively) compared with the linear method (0.7779 and 0.9642 at the same sample sizes), along with more accurate parameter estimation and a significantly lower false discovery rate. Moreover, we apply our methodology to explore the mediation mechanisms involving over 730 000 DNA methylation sites with potential effects between the paternal body mass index (BMI) and offspring growing BMI in the Shanghai sleeping birth cohort data, leading to the identification of two previously undiscovered mediating CpG sites.
中介分析已被广泛用于识别暴露与结局之间潜在的关联途径。然而,在纵向数据中进行高维中介分析仍然缺乏分析方法。为了解决这个问题,我们提出了一种新的有效方法,该方法基于线性混合效应模型和广义估计方程,同时具有变量选择和间接效应(IE)评估功能。首先,我们采用稳健独立性筛选来降低候选中介的维度。然后,我们使用 Sobel 检验结合 Bonferroni 校正进行 IE 假设检验。通过广泛的模拟研究,与线性方法(在相同的样本量下分别为 0.7779 和 0.9642)相比,我们提出的方法具有更高的 F$_{1}$分数(在样本量为 150 和 500 时分别为 0.8056 和 0.9983),同时具有更准确的参数估计和显著更低的假发现率。此外,我们应用我们的方法来探索上海睡眠出生队列数据中涉及超过 730 000 个 DNA 甲基化位点的潜在影响的中介机制,这些位点可能存在于父体体重指数(BMI)与子代生长 BMI 之间,从而鉴定出两个以前未发现的中介 CpG 位点。