Zhu Huanhuan, Zhang Shuanglin, Sha Qiuying
Department of Mathematical Sciences, Michigan Technological University, Houghton, Mich., USA.
Hum Hered. 2015;80(3):144-52. doi: 10.1159/000446239. Epub 2016 Jun 25.
BACKGROUND/AIMS: Genome-wide association studies (GWAS) have identified many variants that each affect multiple phenotypes, which suggests that pleiotropic effects on human complex phenotypes may be widespread. Therefore, statistical methods that can jointly analyze multiple phenotypes in GWAS may have advantages over analyzing each phenotype individually. Several statistical methods have been developed to utilize such multivariate phenotypes in genetic association studies; however, the performance of these methods under different scenarios is largely unknown. Our goal was to provide researchers with useful guidelines on selecting statistical methods for the application of real data to multiple phenotypes.
In this study, we evaluated the performance of some of the existing methods for association studies using multiple phenotypes. These methods included the O'Brien method (OB), cross-validation method (CV), optimal weight method (OW), Trait-based Association Test that uses Extended Simes procedure (TATES), principal components of heritability (PCH), canonical correlation analysis (CCA), multivariate analysis of variance (MANOVA), and a joint model of multiple phenotypes (MultiPhen). We used simulation studies to compare the powers of these methods under a variety of scenarios, including different numbers of phenotypes, different values of between-phenotype correlation, different minor allele frequencies, and different mean and variance models.
Our simulation results show that there is no single method with consistently good performance among all the scenarios. Each method has its own advantages and disadvantages.
背景/目的:全基因组关联研究(GWAS)已鉴定出许多影响多种表型的变异,这表明对人类复杂表型的多效性影响可能很普遍。因此,在GWAS中能够联合分析多种表型的统计方法可能比单独分析每种表型具有优势。已经开发了几种统计方法来在遗传关联研究中利用这种多变量表型;然而,这些方法在不同情况下的性能在很大程度上尚不清楚。我们的目标是为研究人员提供有关选择统计方法以将实际数据应用于多种表型的有用指导。
在本研究中,我们评估了一些使用多种表型进行关联研究的现有方法的性能。这些方法包括奥布赖恩方法(OB)、交叉验证方法(CV)、最优权重方法(OW)、使用扩展西姆斯程序的基于性状的关联检验(TATES)、遗传力主成分(PCH)、典型相关分析(CCA)、多变量方差分析(MANOVA)以及多表型联合模型(MultiPhen)。我们使用模拟研究来比较这些方法在各种情况下的功效,包括不同数量的表型、表型间相关性的不同值、不同的次要等位基因频率以及不同的均值和方差模型。
我们的模拟结果表明,在所有情况下没有一种方法具有始终良好的性能。每种方法都有其自身的优缺点。