Cordell Heather J, Clayton David G
Department of Medical Genetics, University of Cambridge, Cambridge, United Kingdom.
Am J Hum Genet. 2002 Jan;70(1):124-41. doi: 10.1086/338007. Epub 2001 Nov 21.
A stepwise logistic-regression procedure is proposed for evaluation of the relative importance of variants at different sites within a small genetic region. By fitting statistical models with main effects, rather than modeling the full haplotype effects, we generate tests, with few degrees of freedom, that are likely to be powerful for detecting primary etiological determinants. The approach is applicable to either case/control or nuclear-family data, with case/control data modeled via unconditional and family data via conditional logistic regression. Four different conditioning strategies are proposed for evaluation of effects at multiple, closely linked loci when family data are used. The first strategy results in a likelihood that is equivalent to analysis of a matched case/control study with each affected offspring matched to three pseudocontrols, whereas the second strategy is equivalent to matching each affected offspring with between one and three pseudocontrols. Both of these strategies require you be able to infer parental phase (i.e., those haplotypes present in the parents). Families in which phase cannot be determined must be discarded, which can considerably reduce the effective size of a data set, particularly when large numbers of loci that are not very polymorphic are being considered. Therefore, a third strategy is proposed in which knowledge of parental phase is not required, which allows those families with ambiguous phase to be included in the analysis. The fourth and final strategy is to use conditioning method 2 when parental phase can be inferred and to use conditioning method 3 otherwise. The methods are illustrated using nuclear-family data to evaluate the contribution of loci in the HLA region to the development of type 1 diabetes.
本文提出一种逐步逻辑回归程序,用于评估小遗传区域内不同位点变异的相对重要性。通过拟合具有主效应的统计模型,而非对完整单倍型效应进行建模,我们生成了自由度较少的检验,这些检验在检测主要病因决定因素方面可能具有强大的效力。该方法适用于病例/对照数据或核心家系数据,对于病例/对照数据通过无条件逻辑回归建模,对于家系数据通过条件逻辑回归建模。当使用家系数据评估多个紧密连锁位点的效应时,提出了四种不同的条件策略。第一种策略得到的似然比等同于对每个患病后代与三个虚拟对照进行匹配的配对病例/对照研究的分析,而第二种策略等同于将每个患病后代与一到三个虚拟对照进行匹配。这两种策略都要求能够推断亲本相位(即亲本中存在的那些单倍型)。无法确定相位的家系必须舍弃,这可能会大幅减少数据集的有效规模,特别是在考虑大量多态性不高的位点时。因此,提出了第三种策略,该策略不需要了解亲本相位,从而允许将相位不明确的家系纳入分析。第四种也是最后一种策略是,当可以推断亲本相位时使用条件方法2,否则使用条件方法3。使用核心家系数据说明了这些方法,以评估HLA区域中的位点对1型糖尿病发病的贡献。