Allen Andrew S, Rathouz Paul J, Satten Glen A
Department of Biostatistics and Bioinformatics and Duke Clinical Research Institute, Duke University Medical Center, Durham, NC, USA.
Am J Hum Genet. 2003 Mar;72(3):671-80. doi: 10.1086/368276. Epub 2003 Feb 14.
We consider the effect of informative missingness on association tests that use parental genotypes as controls and that allow for missing parental data. Parental data can be informatively missing when the probability of a parent being available for study is related to that parent's genotype; when this occurs, the distribution of genotypes among observed parents is not representative of the distribution of genotypes among the missing parents. Many previously proposed procedures that allow for missing parental data assume that these distributions are the same. We propose association tests that behave well when parental data are informatively missing, under the assumption that, for a given trio of paternal, maternal, and affected offspring genotypes, the genotypes of the parents and the sex of the missing parents, but not the genotype of the affected offspring, can affect parental missingness. (This same assumption is required for validity of an analysis that ignores incomplete parent-offspring trios.) We use simulations to compare our approach with previously proposed procedures, and we show that if even small amounts of informative missingness are not taken into account, they can have large, deleterious effects on the performance of tests.
我们考虑信息性缺失对关联检验的影响,这些关联检验使用亲本基因型作为对照,并允许亲本数据缺失。当一个亲本可供研究的概率与其基因型相关时,亲本数据可能会出现信息性缺失;当这种情况发生时,观察到的亲本中基因型的分布并不代表缺失亲本中基因型的分布。许多先前提出的允许亲本数据缺失的程序都假定这些分布是相同的。我们提出了在亲本数据存在信息性缺失时表现良好的关联检验,其假设是,对于给定的父本、母本和受影响后代的三联基因型,亲本的基因型和缺失亲本的性别(而非受影响后代的基因型)会影响亲本的缺失情况。(忽略不完全的亲子三联体的分析的有效性也需要这个相同的假设。)我们使用模拟将我们的方法与先前提出的程序进行比较,并且我们表明,如果即使少量的信息性缺失没有被考虑在内,它们也会对检验的性能产生很大的有害影响。