Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94035, USA.
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae580.
Conditional testing via the knockoff framework allows one to identify-among a large number of possible explanatory variables-those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance.
While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank.
The described algorithms are implemented in an open-source Julia package Knockoffs.jl. R and Python wrappers are available as knockoffsr and knockoffspy packages.
通过 knockoff 框架进行条件检验,可以在大量可能的解释变量中识别出那些对感兴趣的结果具有独特信息的变量,并且对选择提供虚假发现率保证。这种方法特别适合于全基因组关联研究(GWAS)的分析,其目标是识别影响医学相关特征的遗传变异。
虽然条件检验可以比传统的 GWAS 分析方法更强大和精确,但它的香草实现遇到了所有多元分析方法都面临的一个困难:很难区分多个高度相关的回归变量。通过将推断的对象从单个变量转移到相关变量组,可以克服这种僵局。为此,有必要构建“组 knockoffs”。虽然文献中已经有成功的例子,但本文大大扩展了组 knockoffs 的算法和软件集。我们特别关注二阶 knockoffs,对于它们,我们描述了适用于 GWAS 数据的相关矩阵近似值,这导致了相当大的计算节省。我们通过模拟和 UK Biobank 的白蛋白尿数据的分析来说明所提出方法的有效性。
描述的算法在开源 Julia 包 Knockoffs.jl 中实现。R 和 Python 的包装分别是 knockoffsr 和 knockoffspy 包。