Dai Wei, Yang Ming, Wang Chaolong, Cai Tianxi
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, U.S.A.
Computational and Systems Biology, Genome Institute of Singapore, Singapore.
Biometrics. 2017 Sep;73(3):876-884. doi: 10.1111/biom.12643. Epub 2017 Mar 8.
Genome-wide association studies (GWAS) and next generation sequencing studies (NGSS) are often performed in family studies to improve power in identifying genetic variants that are associated with clinical phenotypes. Efficient analysis of genome-wide studies with familial data is challenging due to the difficulty in modeling shared but unmeasured genetic and/or environmental factors that cause dependencies among family members. Existing genetic association testing procedures for family studies largely rely on generalized estimating equations (GEE) or linear mixed-effects (LME) models. These procedures may fail to properly control for type I errors when the imposed model assumptions fail. In this article, we propose the Sequence Robust Association Test (SRAT), a fully rank-based, flexible approach that tests for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. Comparing to existing methods, SRAT has the advantages of allowing for unknown correlation structures and weaker assumptions about the outcome distribution. We provide theoretical justifications for SRAT and show that SRAT includes the well-known Wilcoxon rank sum test as a special case. Extensive simulation studies suggest that SRAT provides better protection against type I error rate inflation, and could be much more powerful for settings with skewed outcome distribution than existing methods. For illustration, we also apply SRAT to the familial data from the Framingham Heart Study and Offspring Study to examine the association between an inflammatory marker and a few sets of genetic variants.
全基因组关联研究(GWAS)和下一代测序研究(NGSS)通常在家族研究中进行,以提高识别与临床表型相关的基因变异的效能。由于难以对导致家庭成员之间存在相关性的共享但未测量的遗传和/或环境因素进行建模,因此对家族数据进行全基因组研究的有效分析具有挑战性。现有的家族研究基因关联测试程序主要依赖广义估计方程(GEE)或线性混合效应(LME)模型。当所施加的模型假设不成立时,这些程序可能无法正确控制I型错误。在本文中,我们提出了序列稳健关联检验(SRAT),这是一种完全基于秩的灵活方法,用于检验一组基因变异与一个结果之间的关联,同时考虑家族内相关性并对协变量进行调整。与现有方法相比,SRAT具有允许未知相关结构和对结果分布假设较弱的优点。我们为SRAT提供了理论依据,并表明SRAT将著名的Wilcoxon秩和检验作为一个特例包含在内。广泛的模拟研究表明,SRAT能更好地防止I型错误率膨胀,并且在结果分布偏态的情况下比现有方法更具效能。为了说明,我们还将SRAT应用于弗雷明汉心脏研究和后代研究的家族数据,以检验一种炎症标志物与几组基因变异之间的关联。