Jadhav Sneha, Tong Xiaoran, Lu Qing
Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America.
Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America.
Genet Epidemiol. 2017 Nov;41(7):636-643. doi: 10.1002/gepi.22063. Epub 2017 Aug 29.
Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence.
尽管测序研究在揭示导致人类疾病的新变异方面具有巨大潜力,但测序数据的高维度给数据分析带来了巨大挑战。此外,对于许多复杂疾病(如精神疾病),会收集多个相关表型。这些表型可以是潜在疾病的不同测量值,或者是表征多种相关疾病以研究共同遗传机制的测量值。虽然联合分析这些表型可能会增加识别疾病相关基因的能力,但不同类型的表型给关联分析带来了挑战。为应对这些挑战,我们提出了一种非参数方法——功能U统计量方法(FU),用于测序数据的多变量分析。它首先从个体的测序数据构建平滑函数,然后使用U统计量测试这些函数与多个表型的关联。该方法为分析具有未知分布的各种类型表型(如二元和连续表型)提供了一个通用框架。使用平滑函数拟合基因内的遗传变异也使我们能够捕捉基因结构的复杂性(如连锁不平衡,LD),这可能会增加关联分析的能力。通过模拟,我们将我们的方法与多变量结果评分测试(MOST)进行了比较,发现我们的测试比MOST具有更好的性能。在实际数据应用中,我们将我们的方法应用于明尼苏达双胞胎研究(MTS)的测序数据,发现了几个尼古丁受体亚基(CHRN)基因的潜在关联,包括与尼古丁依赖和/或酒精依赖相关的CHRNB3。