Wei Changshuai, Li Ming, He Zihuai, Vsevolozhskaya Olga, Schaid Daniel J, Lu Qing
Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America; Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, Texas, United States of America.
Genet Epidemiol. 2014 Dec;38(8):699-708. doi: 10.1002/gepi.21864. Epub 2014 Oct 20.
With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.
随着下一代测序技术的进步,产生了大量的测序数据,这为全面研究罕见变异在复杂疾病遗传病因中的作用提供了绝佳机会。然而,高维测序数据给统计分析带来了巨大挑战。基于传统统计方法的关联分析由于遗传变异频率低和数据维度极高而遭受严重的功效损失。我们开发了一种加权U测序检验,称为WU-SEQ,用于测序数据的高维关联分析。基于非参数U统计量,WU-SEQ不假设潜在的疾病模型和表型分布,并且可以应用于多种表型。通过模拟研究和实证研究,我们表明,当潜在假设被违反时(例如,表型遵循重尾分布),WU-SEQ优于常用的序列核关联检验(SKAT)方法。即使假设得到满足,WU-SEQ仍能达到与SKAT相当的性能。最后,我们将WU-SEQ应用于达拉斯心脏研究(DHS)的测序数据,并检测到血管生成素样蛋白4(ANGPTL 4)与极低密度脂蛋白胆固醇之间的关联。