Suppr超能文献

基于统一模型的多因素降维方法识别与生存表型相关的基因-基因相互作用的比较研究。

A comparative study on the unified model based multifactor dimensionality reduction methods for identifying gene-gene interactions associated with the survival phenotype.

作者信息

Lee Jung Wun, Lee Seungyeoun

机构信息

Department of Statistics, University of Connecticut, Storrs, CT, USA.

Department of Mathematics and Statistics, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul, 05006, South Korea.

出版信息

BioData Min. 2021 Mar 1;14(1):17. doi: 10.1186/s13040-021-00248-9.

Abstract

BACKGROUND

For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely employed to reduce multi-levels of gene-gene interactions into high- or low-risk groups using a binary attribute. For the survival phenotype, the Cox-MDR method has been proposed using a martingale residual of a Cox model since Surv-MDR was first proposed using a log-rank test statistic. Recently, the KM-MDR method was proposed using the Kaplan-Meier median survival time as a classifier. All three methods used the cross-validation procedure to identify single nucleotide polymorphism (SNP) using SNP interactions among all possible SNP pairs. Furthermore, these methods require the permutation test to verify the significance of the selected SNP pairs. However, the unified model-based multifactor dimensionality reduction method (UM-MDR) overcomes this shortcoming of MDR by unifying the significance testing with the MDR algorithm within the framework of the regression model. Neither cross-validation nor permutation testing is required to identify SNP by SNP interactions in the UM-MDR method. The UM-MDR method comprises two steps: in the first step, multi-level genotypes are classified into high- or low-risk groups, and an indicator variable for the high-risk group is defined. In the second step, the significance of the indicator variable of the high-risk group is tested in the regression model included with other adjusting covariates. The Cox-UMMDR method was recently proposed by combining Cox-MDR with UM-MDR to identify gene-gene interactions associated with the survival phenotype. In this study, we propose two simple methods either by combining KM-MDR with UM-MDR, called KM-UMMDR or by modifying Cox-UMMDR by adjusting for the covariate effect in step 1, rather than in step 2, a process called Cox2-UMMDR. The KM-UMMDR method allows the covariate effect to be adjusted for in the regression model of step 2, although KM-MDR cannot adjust for the covariate effect in the classification procedure of step 1. In contrast, Cox2-UMMDR differs from Cox-UMMDR in the sense that the martingale residuals are obtained from a Cox model by adjusting for the covariate effect in step 1 of Cox2-UMMDR whereas Cox-UMMDR adjusts for the covariate effect in the regression model in step 2. We performed simulation studies to compare the power of several methods such as KM-UMMDR, Cox-UMMDR, Cox2-UMMDR, Cox-MDR, and KM-MDR by considering the effect of covariates and the marginal effect of SNPs. We also analyzed a real example of Korean leukemia patient data for illustration and a short discussion is provided.

RESULTS

In the simulation study, two different scenarios are considered: the first scenario compares the power of the cases with and without the covariate effect. The second scenario is to compare the power of cases with the main effect of SNPs versus without the main effect of SNPs. From the simulation results, Cox-UMMDR performs the best across all scenarios among KM-UMMDR, Cox2-UMMDR, Cox-MDR and KM-MDR. As expected, both Cox-UMMDR and Cox-MDR perform better than KM-UMMDR and KM-MDR when a covariate effect exists because the former adjusts for the covariate effect but the latter cannot. However, Cox2-UMMDR behaves similarly to KM-UMMDR and KM-MDR even though there is a covariate effect. This implies that the covariate effect would be more efficiently adjusted for in the regression model of the second step rather than under the classification procedure of the first step. When there is a main effect of any SNP, Cox-UMMDR, Cox2-UMMDR and KM-UMMDR perform better than Cox-MDR and KM-MDR if the main effects of SNPs are properly adjusted for in the regression model. From the simulation results of two different scenarios, Cox-UMMDR seems to be the most robust when there is either any covariate effect adjusting for or any SNP that has a main effect on the survival phenotype. In addition, the power of all methods decreased as the censoring fraction increased from 0.1 to 0.3, as heritability increased. The power of all methods seems to be greater under MAF = 0.2 than under MAF = 0.4. For illustration, both KM-UMMDR and Cox2-UMMDR were applied to identify SNP by SNP interactions with the survival phenotype to a real dataset of Korean leukemia patients.

CONCLUSION

Both KM-UMMDR and Cox2-UMMDR were easily implemented by combining KM-MDR and Cox-MDR with UM-MDR, respectively, to detect significant gene-gene interactions associated with survival time without cross-validation and permutation testing. The simulation results demonstrate the utility of KM-UMMDR, Cox2-UMMDR and Cox-UMMDR, which outperforms Cox-MDR and KM-MDR when some SNPs with only marginal effects might mask the detection of causal epistasis. In addition, Cox-UMMDR, Cox2-UMMDR and Cox-MDR performed better than KM-UMMDR and KM-MDR when there were potentially confounding covariate effects.

摘要

背景

在基因-基因相互作用分析中,多因素降维(MDR)方法已被广泛应用,该方法通过二元属性将多层次的基因-基因相互作用归为高风险或低风险组。对于生存表型,自首次提出使用对数秩检验统计量的Surv-MDR以来,有人提出了使用Cox模型的鞅残差的Cox-MDR方法。最近,有人提出了使用Kaplan-Meier中位生存时间作为分类器的KM-MDR方法。这三种方法均使用交叉验证程序,通过所有可能的单核苷酸多态性(SNP)对之间的SNP相互作用来识别SNP。此外,这些方法需要进行置换检验以验证所选SNP对的显著性。然而,基于统一模型的多因素降维方法(UM-MDR)通过在回归模型框架内将显著性检验与MDR算法统一起来,克服了MDR的这一缺点。在UM-MDR方法中,无需通过交叉验证或置换检验来通过SNP相互作用识别SNP。UM-MDR方法包括两个步骤:第一步,将多水平基因型分为高风险或低风险组,并定义高风险组的指示变量。第二步,在包含其他调整协变量的回归模型中检验高风险组指示变量的显著性。最近,通过将Cox-MDR与UM-MDR相结合提出了Cox-UMMDR方法,以识别与生存表型相关的基因-基因相互作用。在本研究中,我们提出了两种简单方法,一种是将KM-MDR与UM-MDR相结合,称为KM-UMMDR,另一种是通过在步骤1而非步骤2中调整协变量效应来修改Cox-UMMDR,这一过程称为Cox2-UMMDR。KM-UMMDR方法允许在步骤2的回归模型中调整协变量效应,尽管KM-MDR在步骤1的分类过程中无法调整协变量效应。相比之下,Cox2-UMMDR与Cox-UMMDR的不同之处在于,Cox2-UMMDR在步骤1中通过调整协变量效应从Cox模型获得鞅残差,而Cox-UMMDR在步骤2的回归模型中调整协变量效应。我们进行了模拟研究,通过考虑协变量效应和SNP的边际效应,比较了KM-UMMDR、Cox-UMMDR、Cox2-UMMDR,Cox-MDR和KM-MDR等几种方法的效能。我们还分析了韩国白血病患者数据的真实示例进行说明,并提供了简短的讨论。

结果

在模拟研究中,考虑了两种不同的情况:第一种情况比较有无协变量效应时的效能。第二种情况是比较有SNP主效应与无SNP主效应时的效能。从模拟结果来看,在KM-UMMDR、Cox2-UMMDR、Cox-MDR和KM-MDR中,Cox-UMMDR在所有情况下表现最佳。正如预期的那样当存在协变量效应时,Cox-UMMDR和Cox-MDR的表现均优于KM-UMMDR和KM-MDR,因为前者调整了协变量效应而后者不能。然而,即使存在协变量效应,Cox2-UMMDR的表现与KM-UMMDR和KM-MDR相似。这意味着在第二步的回归模型中调整协变量效应比在第一步的分类过程中更有效。当任何SNP存在主效应时,如果在回归模型中对SNP的主效应进行了适当调整,Cox-UMMDR、Cox2-UMMDR和KM-UMMDR的表现优于Cox-MDR和KM-MDR。从两种不同情况的模拟结果来看,当存在任何协变量效应调整或任何对生存表型有主效应的SNP时,Cox-UMMDR似乎是最稳健的。此外,随着删失比例从0.1增加到0.3,所有方法的效能均下降,随着遗传度增加,所有方法的效能似乎在MAF = 0.2时比在MAF = 0.4时更高。为了说明,将KM-UMMDR和Cox2-UMMDR都应用于韩国白血病患者的真实数据集,以识别与生存表型的SNP间相互作用。

结论

通过将KM-MDR和Cox-MDR与UM-MDR相结合,KM-UMMDR和Cox2-UMMDR都易于实现,无需交叉验证和置换检验即可检测与生存时间相关的显著基因-基因相互作用。模拟结果证明了KM-UMMDR、Cox2-UMMDR和Cox-UMMDR的效用,当一些只有边际效应的SNP可能掩盖因果上位性的检测时,它们优于Cox-MDR和KM-MDR。此外,当存在潜在的混杂协变量效应时,Cox-UMMDR、Cox2-UMMDR和Cox-MDR的表现优于KM-UMMDR和KM-MDR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ad/7923479/b882e4a12083/13040_2021_248_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验