Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado, USA.
Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, USA.
Genet Epidemiol. 2024 Sep;48(6):270-288. doi: 10.1002/gepi.22563. Epub 2024 Apr 21.
The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.
全基因组关联研究(GWAS)通常使用线性或逻辑回归模型来识别表型(特征)与感兴趣的基因型(遗传变异)之间的关联。然而,使用加性假设的回归有潜在的局限性。首先,残差的正态性假设在实践中很少见,偏离正态性会增加 I 型错误率。其次,基于这种假设构建模型忽略了遗传结构,如显性、隐性和保护风险病例。忽略遗传变异可能导致关于变异与特征之间关联的虚假结论。我们提出了一种基于数据一致反转(DCI)的无假设模型,这是一种最近开发的用于不确定性量化的测度论框架。所提出的基于 DCI 的模型在模型输入上构建了一个非参数分布,该分布传播到观测数据的分布,而无需回归模型中残差的正态性假设。这一特性使所提出的基于 DCI 的模型能够涵盖所有遗传变异,而无需强调经典-GWAS 模型的加性。模拟和 COPDGene 数据的复制 GWAS 表明,该模型至少能够像经典-GWAS(加性线性模型)方法一样控制 I 型错误率,同时在发现不同遗传传递模式的变异方面具有相似或更大的功效。