复杂疾病连锁扫描后关联分析的混合效应逻辑回归方法

Mixed-effects logistic approach for association following linkage scan for complex disorders.

作者信息

Xu H, Shete S

机构信息

Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA.

出版信息

Ann Hum Genet. 2007 Mar;71(Pt 2):230-7. doi: 10.1111/j.1469-1809.2006.00321.x. Epub 2006 Oct 9.

DOI:10.1111/j.1469-1809.2006.00321.x

PMID:17032287

Abstract

An association study to identify possible causal single nucleotide polymorphisms following linkage scanning is a popular approach for the genetic dissection of complex disorders. However, in association studies cases and controls are assumed to be independent, i.e., genetically unrelated. Choosing a single affected individual per family is statistically inefficient and leads to a loss of power. On the other hand, because of the relatedness of family members, using affected family members and unrelated normal controls directly leads to false-positive results in association studies. In this paper we propose a new approach using mixed-model logistic regression, in which associations are performed using family members and unrelated controls. Thus, the important genetic information can be obtained from family members while retaining high statistical power. To examine the properties of this new approach we developed an efficient algorithm, to simulate environmental risk factors and the genotypes at both the disease locus and a marker locus with and without linkage disequilibrium (LD) in families. Extensive simulation studies showed that our approach can effectively control the type-I error probability. Our approach is better than family-based designs such as TDT, because it allows the use of unrelated cases and controls and uses all of the affected members for whom DNA samples are possibly already available. Our approach also allows the inclusion of covariates such as age and smoking status. Power analysis showed that our method has higher statistical power than recent likelihood ratio-based methods when environmental factors contribute to disease susceptibility, which is true for most complex human disorders. Our method can be further extended to accommodate more complex pedigree structures.

摘要

在连锁扫描之后进行关联研究以识别可能的因果单核苷酸多态性，是复杂疾病基因剖析的一种常用方法。然而，在关联研究中，病例和对照被假定为独立的，即基因上不相关。每个家庭选择一个受影响个体在统计上效率低下且会导致检验效能的损失。另一方面，由于家庭成员之间的相关性，直接使用受影响的家庭成员和不相关的正常对照会在关联研究中导致假阳性结果。在本文中，我们提出了一种使用混合模型逻辑回归的新方法，其中关联分析使用家庭成员和不相关的对照进行。因此，可以从家庭成员中获得重要的遗传信息，同时保持较高的统计效能。为了检验这种新方法的特性，我们开发了一种高效算法，用于模拟家庭中存在和不存在连锁不平衡（LD）情况下疾病位点和标记位点的环境风险因素及基因型。广泛的模拟研究表明，我们的方法可以有效控制I型错误概率。我们的方法比基于家庭的设计（如传递不平衡检验，TDT）更好，因为它允许使用不相关的病例和对照，并使用所有可能已经有DNA样本的受影响成员。我们的方法还允许纳入协变量，如年龄和吸烟状况。效能分析表明，当环境因素对疾病易感性有影响时，我们的方法比最近基于似然比的方法具有更高的统计效能，这在大多数复杂人类疾病中都是如此。我们的方法可以进一步扩展以适应更复杂的家系结构。