Yang James J, Williams L Keoki, Buu Anne
School of Nursing, University of Michigan, Ann Arbor, Michigan, United States of America.
Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America.
PLoS One. 2017 Jan 12;12(1):e0169893. doi: 10.1371/journal.pone.0169893. eCollection 2017.
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher's combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches-dichotomizing all observed phenotypes or treating them as continuous variables-could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies.
我们提出了一种针对混合的连续、二元和有序表型的多变量全基因组关联测试。使用潜在反应模型来估计具有不同测量尺度的表型之间的相关性,以便在原假设下有效估计费舍尔组合统计量的经验分布。模拟研究表明,我们提出的相关性估计方法具有较高的准确性。更重要的是,我们的方法保守地估计了检验统计量的方差,从而控制了I型错误率。模拟还表明,所提出的测试在控制I型错误的同时,将功效维持在非常接近基于已知潜在表型的理想分析的水平。相比之下,传统方法——将所有观察到的表型二分或将它们视为连续变量——要么会降低功效,要么采用不适用于数据的线性回归模型。此外,对成瘾:遗传学与环境研究(SAGE)数据库的统计分析表明,对多个表型进行多变量测试可以提高识别标记的功效,否则使用边际测试可能无法选择这些标记。所提出的方法还为在全基因组关联研究中将法格斯特龙尼古丁依赖测试分析为多变量表型提供了一种新方法。