He Qianchuan, Cai Tianxi, Liu Yang, Zhao Ni, Harmon Quaker E, Almli Lynn M, Binder Elisabeth B, Engel Stephanie M, Ressler Kerry J, Conneely Karen N, Lin Xihong, Wu Michael C
Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.
Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America.
Genet Epidemiol. 2016 Dec;40(8):722-731. doi: 10.1002/gepi.21993. Epub 2016 Aug 3.
Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.
核机器学习方法,如单核苷酸多态性集核关联检验(SKAT),已被广泛用于检验性状与基因多态性之间的关联。与传统的单核苷酸多态性分析方法不同,这些方法旨在检验一组相关单核苷酸多态性(如基因或通路内的一组单核苷酸多态性)的联合效应,并且能够识别与感兴趣的性状相关的单核苷酸多态性集合。然而,与许多多单核苷酸多态性检验方法一样,核机器检验只能在单核苷酸多态性集水平上得出结论,而不能直接告知所识别的单核苷酸多态性集中哪些实际上驱动了这种关联。最近提出的一种方法,核迭代特征提取(KNIFE),提供了一个将变量选择纳入核机器方法的通用框架。在本文中,我们关注数量性状和相对常见的单核苷酸多态性,并将KNIFE方法应用于基因关联研究,提出了一种在将SKAT应用于基因集分析后识别驱动单核苷酸多态性的方法。我们的方法适用于单核苷酸多态性分析中广泛使用的几种核,如线性核和状态一致性(IBS)核。所提出的方法为单核苷酸多态性的优先级排序提供了实际有用的工具,并填补了单核苷酸多态性集分析与生物学功能研究之间的空白。模拟研究和实际数据应用都被用来证明所提出的方法。