Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany.
IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany.
Sci Rep. 2023 Jan 17;13(1):937. doi: 10.1038/s41598-023-28172-4.
Gene-environment (GxE) interactions are an important and sophisticated component in the manifestation of complex phenotypes. Simple univariate tests lack statistical power due to the need for multiple testing adjustment and not incorporating potential interplay between several genetic loci. Approaches based on internally constructed genetic risk scores (GRS) require the partitioning of the available sample into training and testing data sets, thus, lowering the effective sample size for testing the GxE interaction itself. To overcome these issues, we propose a statistical test that employs bagging (bootstrap aggregating) in the GRS construction step and utilizes its out-of-bag prediction mechanism. This approach has the key advantage that the full available data set can be used for both constructing the GRS and testing the GxE interaction. To also incorporate interactions between genetic loci, we, furthermore, investigate if using random forests as the GRS construction method in GxE interaction testing further increases the statistical power. In a simulation study, we show that both novel procedures lead to a higher statistical power for detecting GxE interactions, while still controlling the type I error. The random-forests-based test outperforms a bagging-based test that uses the elastic net as its base learner in most scenarios. An application of the testing procedures to a real data set from a German cohort study suggests that there might be a GxE interaction involving exposure to air pollution regarding rheumatoid arthritis.
基因-环境(GxE)相互作用是复杂表型表现的一个重要而复杂的组成部分。由于需要进行多次测试调整,并且没有纳入几个遗传位点之间的潜在相互作用,简单的单变量测试缺乏统计学效力。基于内部构建的遗传风险评分(GRS)的方法需要将可用样本划分为训练和测试数据集,从而降低了测试 GxE 相互作用本身的有效样本量。为了克服这些问题,我们提出了一种统计检验方法,该方法在 GRS 构建步骤中使用装袋(自举聚合),并利用其袋外预测机制。这种方法的主要优点是可以使用整个可用数据集来构建 GRS 和测试 GxE 相互作用。为了进一步纳入遗传位点之间的相互作用,我们还研究了在 GxE 相互作用测试中使用随机森林作为 GRS 构建方法是否会进一步提高统计效力。在一项模拟研究中,我们表明,这两种新方法都可以提高检测 GxE 相互作用的统计效力,同时仍能控制第一类错误。在大多数情况下,基于随机森林的检验优于使用弹性网络作为基础学习者的基于装袋的检验。对来自德国队列研究的真实数据集的测试程序的应用表明,在暴露于空气污染与类风湿关节炎之间可能存在 GxE 相互作用。