Interdisciplinary Graduate Program in Applied Mathematical and Computational Sciences.
Department of Biostatistics, University of Iowa, Iowa City, IA 52241, USA.
Bioinformatics. 2017 Dec 15;33(24):3887-3894. doi: 10.1093/bioinformatics/btx522.
Genome-wide association studies (GWAS) have played an important role in identifying genetic variants underlying human complex traits. However, its success is hindered by weak effect at causal variants and presence of noise at non-causal variants. In an effort to overcome these difficulties, a previous study proposed a regularized regression method that penalizes on the difference of signal strength between two consecutive single-nucleotide polymorphisms (SNPs).
We provide a generalization to the afore-mentioned method so that more adjacent SNPs can be incorporated. The choice of optimal number of SNPs is studied. Simulation studies indicate that when consecutive SNPs have similar absolute coefficients our method performs better than using LASSO penalty. In other situations, our method is still comparable to using LASSO penalty. The practical utility of the proposed method is demonstrated by applying it to Genetic Analysis Workshop 16 rheumatoid arthritis GWAS data.
An implementation of the proposed method is provided in R package MWLasso.
全基因组关联研究(GWAS)在鉴定人类复杂性状的遗传变异方面发挥了重要作用。然而,其成功受到因果变异效应较弱和非因果变异存在噪声的阻碍。为了克服这些困难,先前的一项研究提出了一种正则化回归方法,该方法对两个连续单核苷酸多态性(SNP)之间信号强度的差异进行惩罚。
我们对上述方法进行了推广,以便可以纳入更多相邻的 SNP。研究了最佳 SNP 数量的选择。模拟研究表明,当连续 SNP 的绝对系数相似时,我们的方法比使用 LASSO 惩罚的效果更好。在其他情况下,我们的方法仍然可以与使用 LASSO 惩罚相媲美。通过将其应用于遗传分析研讨会 16 类风湿关节炎 GWAS 数据,证明了所提出方法的实际效用。
在 R 包 MWLasso 中提供了所提出方法的实现。