Lin Xin-Qi, Liang Rong, Zhang Jun-Guo, Pi Lu-Cheng, Chen Si-Dong, Liu Li, Gao Yan-Hui
Department of Epidemiology and Biostatistics, School of Public Health, Guangdong Pharmaceutical University, Guangzhou 510310, China; Guangdong Province Hospital for Occupational Disease Prevention and Treatment, Guangzhou 510300, China.
Department of Epidemiology and Biostatistics, School of Public Health, Guangdong Pharmaceutical University, Guangzhou 510310, China.
Yi Chuan. 2018 Feb 20;40(2):162-169. doi: 10.16288/j.yczz.17-174.
Common burden tests have different statistical performance in genetic association studies of rare variants. Here, we compare the statistical performance of burden tests, such as CMC, WST, SUM and extension methods, using the computer-simulated datasets of rare variants with different parameters of sample sizes, linkage disequilibrium (LD), and different numbers of mixed non-associated variants. The simulation results showed that the type I error for all methods is near 0.05. When the rare variants had the same direction of effect, the higher LD and the less non-associated variants, the higher the power of these method, except the data adaptive SUM test. When the direction was different, the power was significantly reduced for all methods. The methods that consider the direction yielded larger statistical power than those methods without considering the effect direction, except the strong LD condition. And the larger the sample size, the larger the power. The statistical performance of burden tests is affected by a variety of factors, including the sample size, effect direction of variants, non-associated variants, and LD. Therefore, when choosing the method and setting the collection unit and weight, the prior biological information of genetic variation should be integrated to improve study efficiency.
常见的负担检验在罕见变异的基因关联研究中具有不同的统计性能。在此,我们使用具有不同样本量、连锁不平衡(LD)参数以及不同数量混合非关联变异的罕见变异计算机模拟数据集,比较负担检验(如CMC、WST、SUM)及其扩展方法的统计性能。模拟结果表明,所有方法的I型错误均接近0.05。当罕见变异具有相同的效应方向时,LD越高且非关联变异越少,这些方法(数据自适应SUM检验除外)的检验效能越高。当效应方向不同时,所有方法的检验效能均显著降低。考虑效应方向的方法比不考虑效应方向的方法具有更大的统计效能,但强LD条件除外。样本量越大,检验效能越大。负担检验的统计性能受多种因素影响,包括样本量、变异的效应方向、非关联变异以及LD。因此,在选择方法并设定集合单位和权重时,应整合遗传变异的先验生物学信息以提高研究效率。