Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
The Huck Institutes of the Life Science, Pennsylvania State University, University Park, PA, USA.
BMC Bioinformatics. 2018 Apr 4;19(1):120. doi: 10.1186/s12859-018-2135-0.
Phenome-wide association studies (PheWAS) are a high-throughput approach to evaluate comprehensive associations between genetic variants and a wide range of phenotypic measures. PheWAS has varying sample sizes for quantitative traits, and variable numbers of cases and controls for binary traits across the many phenotypes of interest, which can affect the statistical power to detect associations. The motivation of this study is to investigate the various parameters which affect the estimation of statistical power in PheWAS, including sample size, case-control ratio, minor allele frequency, and disease penetrance.
We performed a PheWAS simulation study, where we investigated variations in statistical power based on different parameters, such as overall sample size, number of cases, case-control ratio, minor allele frequency, and disease penetrance. The simulation was performed on both binary and quantitative phenotypic measures. Our simulation on binary traits suggests that the number of cases has more impact on statistical power than the case to control ratio; also, we found that a sample size of 200 cases or more maintains the statistical power to identify associations for common variants. For quantitative traits, a sample size of 1000 or more individuals performed best in the power calculations. We focused on common genetic variants (MAF > 0.01) in this study; however, in future studies, we will be extending this effort to perform similar simulations on rare variants.
This study provides a series of PheWAS simulation analyses that can be used to estimate statistical power for some potential scenarios. These results can be used to provide guidelines for appropriate study design for future PheWAS analyses.
表型全基因组关联研究(PheWAS)是一种高通量的方法,用于评估遗传变异与广泛的表型测量之间的综合关联。PheWAS 针对定量性状具有不同的样本量,针对二分类性状具有不同的病例数和对照组数量,对于许多感兴趣的表型,这会影响检测关联的统计功效。本研究的目的是研究影响 PheWAS 中统计功效估计的各种参数,包括样本量、病例对照比、次要等位基因频率和疾病外显率。
我们进行了 PheWAS 模拟研究,其中我们根据不同的参数(如总样本量、病例数、病例对照比、次要等位基因频率和疾病外显率)研究了统计功效的变化。模拟针对二分类和定量表型测量进行。我们对二分类性状的模拟表明,病例数对统计功效的影响大于病例对照比;此外,我们发现样本量为 200 例或更多可维持识别常见变异关联的统计功效。对于定量性状,样本量为 1000 或更多个体在功效计算中表现最佳。在本研究中,我们关注常见遗传变异(MAF>0.01);然而,在未来的研究中,我们将努力对罕见变异进行类似的模拟。
本研究提供了一系列 PheWAS 模拟分析,可用于估计某些潜在情况的统计功效。这些结果可用于为未来的 PheWAS 分析提供适当的研究设计指南。