Department of Statistics, Feng Chia University, 40724 Taichung, Taiwan.
Front Biosci (Landmark Ed). 2022 Jul 18;27(8):225. doi: 10.31083/j.fbl2708225.
In biomedical and epidemiological studies, gene-environment (G-E) interactions play an important role in the etiology and progression of many complex diseases. In ultra-high-dimensional survival genomic data, two common approaches (marginal and joint models) are proposed to determine important interaction biomarkers. Most existing methods for detecting G-E interactions (marginal Cox model and marginal accelerated failure time model) are limited by a lack of robustness to contamination/outliers in response outcome and prediction biomarkers. In particular, right-censored survival outcomes and ultra-high-dimensional feature space make relevant feature screening even more challenging.
In this paper, we utilize the non-parametric Kendall's partial correlation method to obtain pure correlation to determine the importance of G-E interactions concerning clinical survival data under a marginal modeling framework.
A series of simulated scenarios are conducted to compare the performance of our proposed method (Kendall's partial correlation) with some commonly used methods (marginal Cox's model, marginal accelerated failure time model, and censoring quantile partial correlation approach). In real data applications, we utilize Kendall's partial correlation method to identify G-E interactions related to the clinical survival results of patients with esophageal, pancreatic, and lung carcinomas using The Cancer Genome Atlas clinical survival genetic data, and further establish survival prediction models.
Overall, both simulation with medium censoring level and real data studies show that our method performs well and outperforms existing methods in the selection, estimation, and prediction accuracy of main and interacting biomarkers. These applications reveal the advantages of the non-parametric Kendall's partial correlation approach over alternative semi-parametric marginal modeling methods. We also identified the cancer-related G-E interactions biomarkers and reported the corresponding coefficients with -values.
在生物医学和流行病学研究中,基因-环境(G-E)相互作用在许多复杂疾病的病因和进展中起着重要作用。在超高维生存基因组数据中,提出了两种常见的方法(边际和联合模型)来确定重要的交互生物标志物。用于检测 G-E 相互作用的大多数现有方法(边际 Cox 模型和边际加速失效时间模型)都受到响应结果和预测生物标志物中污染/异常值的稳健性的限制。特别是,右删失生存结局和超高维特征空间使得相关特征筛选更加具有挑战性。
在本文中,我们利用非参数 Kendall 部分相关方法来获得纯相关,以确定在边际建模框架下,临床生存数据中 G-E 相互作用的重要性。
进行了一系列模拟场景,以比较我们提出的方法(Kendall 部分相关)与一些常用方法(边际 Cox 模型、边际加速失效时间模型和删失分位数部分相关方法)的性能。在实际数据应用中,我们利用 Kendall 部分相关方法,利用癌症基因组图谱临床生存遗传数据,确定与食管、胰腺和肺癌患者临床生存结果相关的 G-E 相互作用,并进一步建立生存预测模型。
总体而言,中等删失水平的模拟和真实数据研究均表明,我们的方法在主生物标志物和交互生物标志物的选择、估计和预测准确性方面表现良好,优于现有方法。这些应用揭示了非参数 Kendall 部分相关方法相对于替代半参数边际建模方法的优势。我们还确定了与癌症相关的 G-E 相互作用生物标志物,并报告了相应的系数及其 - 值。