Dudbridge Frank
MRC Biostatistics Unit, Cambridge, UK.
Hum Hered. 2008;66(2):87-98. doi: 10.1159/000119108. Epub 2008 Mar 31.
Missing data occur in genetic association studies for several reasons including missing family members and uncertain haplotype phase. Maximum likelihood is a commonly used approach to accommodate missing data, but it can be difficult to apply to family-based association studies, because of possible loss of robustness to confounding by population stratification. Here a novel likelihood for nuclear families is proposed, in which distinct sets of association parameters are used to model the parental genotypes and the offspring genotypes. This approach is robust to population structure when the data are complete, and has only minor loss of robustness when there are missing data. It also allows a novel conditioning step that gives valid analysis for multiple offspring in the presence of linkage. Unrelated subjects are included by regarding them as the children of two missing parents. Simulations and theory indicate similar operating characteristics to TRANSMIT, but with no bias with missing data in the presence of linkage. In comparison with FBAT and PCPH, the proposed model is slightly less robust to population structure but has greater power to detect strong effects. In comparison to APL and MITDT, the model is more robust to stratification and can accommodate sibships of any size. The methods are implemented for binary and continuous traits in software, UNPHASED, available from the author.
在基因关联研究中,缺失数据的出现有多种原因,包括家庭成员缺失和单倍型相位不确定。最大似然法是处理缺失数据常用的方法,但由于可能会因群体分层而失去对混杂因素的稳健性,所以难以应用于基于家系的关联研究。本文提出了一种针对核心家系的新似然法,其中使用不同的关联参数集来对父母基因型和后代基因型进行建模。当数据完整时,该方法对群体结构具有稳健性,而在存在缺失数据时,稳健性仅有轻微损失。它还允许一个新的条件步骤,在存在连锁的情况下对多个后代进行有效分析。通过将无关个体视为两个缺失父母的子女来纳入分析。模拟和理论表明,该方法与TRANSMIT具有相似的操作特性,但在存在连锁和缺失数据的情况下没有偏差。与FBAT和PCPH相比,所提出的模型对群体结构的稳健性略低,但检测强效应的能力更强。与APL和MITDT相比,该模型对分层更具稳健性,并且可以处理任何规模的同胞关系。这些方法已在作者提供的软件UNPHASED中针对二元性状和连续性状实现。