Sesia M, Sabatti C, Candès E J
Department of Statistics, Stanford University, 390 Serra Mall, Stanford, California, USA.
Biometrika. 2019 Mar;106(1):1-18. doi: 10.1093/biomet/asy033. Epub 2018 Aug 4.
Modern scientific studies often require the identification of a subset of explanatory variables. Several statistical methods have been developed to automate this task, and the framework of knockoffs has been proposed as a general solution for variable selection under rigorous Type I error control, without relying on strong modelling assumptions. In this paper, we extend the methodology of knockoffs to problems where the distribution of the covariates can be described by a hidden Markov model. We develop an exact and efficient algorithm to sample knockoff variables in this setting and then argue that, combined with the existing selective framework, this provides a natural and powerful tool for inference in genome-wide association studies with guaranteed false discovery rate control. We apply our method to datasets on Crohn's disease and some continuous phenotypes.
现代科学研究常常需要识别解释变量的一个子集。已经开发了几种统计方法来自动完成这项任务,并且“仿冒品”框架已被提出作为在严格的I型错误控制下进行变量选择的通用解决方案,而无需依赖强大的建模假设。在本文中,我们将“仿冒品”方法扩展到协变量分布可以用隐马尔可夫模型描述的问题。我们开发了一种精确且高效的算法来在此设置下对仿冒变量进行采样,然后论证,结合现有的选择框架,这为全基因组关联研究中的推断提供了一个自然且强大的工具,同时保证了错误发现率的控制。我们将我们的方法应用于克罗恩病和一些连续表型的数据集。