Li Ming, He Zihuai, Schaid Daniel J, Cleves Mario A, Nick Todd G, Lu Qing
Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72202.
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109.
Genetics. 2015 May;200(1):69-78. doi: 10.1534/genetics.115.175174. Epub 2015 Mar 5.
Family-based study design is commonly used in genetic research. It has many ideal features, including being robust to population stratification (PS). With the advance of high-throughput technologies and ever-decreasing genotyping cost, it has become common for family studies to examine a large number of variants for their associations with disease phenotypes. The yield from the analysis of these family-based genetic data can be enhanced by adopting computationally efficient and powerful statistical methods. We propose a general framework of a family-based U-statistic, referred to as family-U, for family-based association studies. Unlike existing parametric-based methods, the proposed method makes no assumption of the underlying disease models and can be applied to various phenotypes (e.g., binary and quantitative phenotypes) and pedigree structures (e.g., nuclear families and extended pedigrees). By using only within-family information, it can offer robust protection against PS. In the absence of PS, it can also utilize additional information (i.e., between-family information) for power improvement. Through simulations, we demonstrated that family-U attained higher power over a commonly used method, family-based association tests, under various disease scenarios. We further illustrated the new method with an application to large-scale family data from the Framingham Heart Study. By utilizing additional information (i.e., between-family information), family-U confirmed a previous association of CHRNA5 with nicotine dependence.
基于家系的研究设计在基因研究中普遍使用。它具有许多理想的特性,包括对群体分层(PS)具有稳健性。随着高通量技术的进步和基因分型成本的不断降低,家系研究检测大量变异与疾病表型之间的关联已变得很常见。通过采用计算高效且强大的统计方法,可以提高对这些基于家系的基因数据的分析效率。我们提出了一种基于家系的U统计量的通用框架,称为家系-U,用于基于家系的关联研究。与现有的基于参数的方法不同,该方法不假设潜在的疾病模型,可应用于各种表型(如二元和定量表型)和家系结构(如核心家庭和扩展家系)。通过仅使用家系内信息,它可以提供针对群体分层的稳健保护。在不存在群体分层的情况下,它还可以利用额外信息(即家系间信息)来提高检验效能。通过模拟,我们证明了在各种疾病场景下,家系-U比常用的基于家系的关联检验方法具有更高的检验效能。我们进一步通过应用弗雷明汉心脏研究的大规模家系数据说明了这种新方法。通过利用额外信息(即家系间信息),家系-U证实了之前CHRNA5与尼古丁依赖之间的关联。