Division of Human Genetics, Department of Internal Medicine, The Ohio State University Wexner Medical Center, Columbus, Ohio, United States of America; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, Florida, United States of America.
Genet Epidemiol. 2014 May;38(4):325-44. doi: 10.1002/gepi.21805. Epub 2014 Apr 10.
Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.
蒙特卡罗置换检验可以通过选择一组个体指数的置换和一个衡量基因型与发病状态之间关联的实值检验统计量来正式构建。在本文中,我们为存在缺失基因型时这些检验的有效性验证开发了一个严格的理论框架。我们首先在一个无关个体的遗传病例对照研究中为观察到的基因型数据指定一个非参数概率模型。在这个模型和关于检验统计量的一些最小假设下,如果(1)所选择的个体指数置换集是一个复合群,并且(2)在零假设下,观察到的基因型评分矩阵的分布在根据这个集中的任意置换打乱个体的分配后不会改变,则由此产生的蒙特卡罗置换检验是精确水平α的。我们应用这些条件表明,基于个体所有置换的常用蒙特卡罗置换检验仅当满足相当严格的附加假设的缺失数据过程才保证精确水平α。然而,如果缺失数据过程取决于所有识别和记录的协变量,我们还表明,基于具有相同协变量值的个体层内置换集的蒙特卡罗置换检验是精确水平α的。我们的理论结果通过各种缺失数据过程和检验统计量的模拟得到了验证和补充。