Liu Jin, Wang Kai, Ma Shuangge, Huang Jian
School of Public Health, Yale University, New Haven, CT 06520, USA.
Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA.
Stat Interface. 2013 Jan 1;6(1):99-115. doi: 10.4310/SII.2013.v6.n1.a10.
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient in dealing with a large number of SNPs. A multi-split method is used to calculate the -values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with LASSO and MCP approaches are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using heterogeneous stock mice data and a rheumatoid arthritis.
惩罚回归方法在全基因组关联研究(GWAS)中越来越受欢迎,用于识别与疾病相关的遗传标记。然而,诸如LASSO等标准惩罚方法没有考虑相邻标记之间可能存在的连锁不平衡。我们提出了一种使用密集单核苷酸多态性(SNP)集的新型GWAS惩罚方法。所提出的方法使用极小极大凹惩罚(MCP)进行标记选择,并通过惩罚具有高相关性的相邻SNP的遗传效应差异来纳入连锁不平衡(LD)信息。推导了一种坐标下降算法来实现所提出的方法。该算法在处理大量SNP时效率很高。使用多分割方法来计算所选SNP的值以评估其显著性。我们将所提出的惩罚函数称为平滑MCP,将所提出的方法称为SMCP方法。通过模拟研究评估了所提出的SMCP方法的性能及其与LASSO和MCP方法的比较,结果表明所提出的方法在选择相关SNP方面更准确。使用异质品系小鼠数据和类风湿性关节炎说明了其在实际数据中的适用性。