Tian Peixin, Yao Minhao, Huang Tao, Liu Zhonghua
Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong SAR 999077, China.
Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China.
Bioinformatics. 2022 Nov 30;38(23):5229-5235. doi: 10.1093/bioinformatics/btac687.
It is of scientific interest to identify DNA methylation CpG sites that might mediate the effect of an environmental exposure on a survival outcome in high-dimensional mediation analysis. However, there is a lack of powerful statistical methods that can provide a guarantee of false discovery rate (FDR) control in finite-sample settings.
In this article, we propose a novel method called CoxMKF, which applies aggregation of multiple knockoffs to a Cox proportional hazards model for a survival outcome with high-dimensional mediators. The proposed CoxMKF can achieve FDR control even in finite-sample settings, which is particularly advantageous when the sample size is not large. Moreover, our proposed CoxMKF can overcome the randomness of the unstable model-X knockoffs. Our simulation results show that CoxMKF controls FDR well in finite samples. We further apply CoxMKF to a lung cancer dataset from The Cancer Genome Atlas (TCGA) project with 754 subjects and 365 306 DNA methylation CpG sites, and identify four DNA methylation CpG sites that might mediate the effect of smoking on the overall survival among lung cancer patients.
The R package CoxMKF is publicly available at https://github.com/MinhaoYaooo/CoxMKF.
Supplementary data are available at Bioinformatics online.
在高维中介分析中,识别可能介导环境暴露对生存结局影响的DNA甲基化CpG位点具有科学意义。然而,缺乏在有限样本设置中能够保证控制错误发现率(FDR)的强大统计方法。
在本文中,我们提出了一种名为CoxMKF的新方法,该方法将多个仿冒变量的聚合应用于具有高维中介变量的生存结局的Cox比例风险模型。所提出的CoxMKF即使在有限样本设置中也能实现FDR控制,当样本量不大时这一优势尤为明显。此外,我们提出的CoxMKF可以克服不稳定的模型-X仿冒变量的随机性。我们的模拟结果表明,CoxMKF在有限样本中能很好地控制FDR。我们进一步将CoxMKF应用于来自癌症基因组图谱(TCGA)项目的一个包含754名受试者和365306个DNA甲基化CpG位点的肺癌数据集,并识别出四个可能介导吸烟对肺癌患者总生存影响的DNA甲基化CpG位点。
R包CoxMKF可在https://github.com/MinhaoYaooo/CoxMKF上公开获取。
补充数据可在《生物信息学》在线获取。