Li Chun, Boehnke Michael
Department of Biostatistics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee 37232-0700, USA.
Genet Epidemiol. 2006 Apr;30(3):220-30. doi: 10.1002/gepi.20139.
In haplotype-based association studies for late onset diseases, one attractive design is to use available unaffected spouses as controls (Valle et al. [1998] Diab. Care 21:949-958). Given cases and spouses only, the standard expectation-maximization (EM) algorithm (Dempster et al. [1977] J. R. Stat. Soc. B 39:1-38) for case-control data can be used to estimate haplotype frequencies. But often we will have offspring for at least some of the spouse pairs, and offspring genotypes provide additional information about the haplotypes of the parents. Existing methods may either ignore the offspring information, or reconstruct haplotypes for the subjects using offspring information and discard data from those whose haplotypes cannot be reconstructed with high confidence. Neither of these approaches is efficient, and the latter approach may also be biased. For case-control data with some subjects forming spouse pairs and offspring genotypes available for some spouse pairs or individuals, we propose a unified, likelihood-based method of haplotype inference. The method makes use of available offspring genotype information to apportion ambiguous haplotypes for the subjects. For subjects without offspring genotype information, haplotypes are apportioned as in the standard EM algorithm for case-control data. Our method enables efficient haplotype frequency estimation using an EM algorithm and supports probabilistic haplotype reconstruction with the probability calculated based on the whole sample. We describe likelihood ratio and permutation tests to test for disease-haplotype association, and describe three test statistics that are potentially useful for detecting such an association.
在针对晚发性疾病的基于单倍型的关联研究中,一种颇具吸引力的设计是使用现有的未患病配偶作为对照(瓦莱等人[1998年]《糖尿病护理》21:949 - 958)。仅给定病例和配偶时,用于病例对照数据的标准期望最大化(EM)算法(邓普斯特等人[1977年]《皇家统计学会学报》B辑39:1 - 38)可用于估计单倍型频率。但通常我们至少会有部分配偶对的后代,而后代的基因型能提供有关父母单倍型的额外信息。现有方法要么忽略后代信息,要么利用后代信息为研究对象重建单倍型,并舍弃那些单倍型无法高置信度重建的对象的数据。这两种方法都效率不高,且后一种方法可能还存在偏差。对于存在一些形成配偶对的研究对象且部分配偶对或个体有后代基因型可用的病例对照数据,我们提出一种基于似然性的统一单倍型推断方法。该方法利用现有的后代基因型信息为研究对象分配模糊的单倍型。对于没有后代基因型信息的研究对象,单倍型的分配方式与用于病例对照数据的标准EM算法相同。我们的方法能够使用EM算法进行高效的单倍型频率估计,并支持基于整个样本计算概率的概率性单倍型重建。我们描述了用于检验疾病 - 单倍型关联的似然比检验和置换检验,并描述了三种可能有助于检测这种关联的检验统计量。