Sun Rui, Deng Qiao, Hu Inchi, Zee Benny Chung-Ying, Wang Maggie Haitian
Division of Biostatistics, School of Public Health and Primary Care, Chinese University of Hong Kong, Shatin, Hong Kong.
Department of ISOM, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong.
BMC Proc. 2016 Oct 18;10(Suppl 7):153-157. doi: 10.1186/s12919-016-0022-0. eCollection 2016.
With the development of the next-generation sequencing technology, the influence of rare variants on complex disease has gathered increasing attention. In this paper, we propose a clustering-based approach, the clustering sum test, to test the effects of rare variants association by using the simulated data provided by the Genetic Analysis Workshop 19 with an unbalanced case-control ratio. The control individuals are (a) clustered into several subgroups, (b) statistics of the separate subcontrol groups as compared to the case group are calculated, and (c) a combined statistic value is obtained based on a distance score. Collapsing of rare variants is used together with the proposed method. In our results, comparing the same statistical test with and without clustering, the clustering strategy increases the number of true positives identified in the top 100 markers by 17.24 %. Compared to the sequence kernel association test, the proposed method is more robust in terms of replicated frequencies in the replicates data sets. The results suggest that the clustering approach could improve the power of nonparametric tests and that the clustering sum test has the potential to serve as a practical tool when dealing with rare variants with unbalanced case-control data in genome-wide case-control studies.
随着下一代测序技术的发展,罕见变异对复杂疾病的影响日益受到关注。在本文中,我们提出了一种基于聚类的方法——聚类和检验,通过使用遗传分析研讨会19提供的模拟数据,在病例对照比例不平衡的情况下检验罕见变异的关联效应。将对照个体:(a) 聚类为几个亚组,(b) 计算各个亚对照组与病例组相比的统计量,(c) 根据距离得分获得一个合并的统计量值。将罕见变异的压缩方法与所提出的方法一起使用。在我们的结果中,比较有无聚类的相同统计检验,聚类策略使在前100个标记中识别出的真阳性数量增加了17.24%。与序列核关联检验相比,所提出的方法在重复数据集的重复频率方面更稳健。结果表明,聚类方法可以提高非参数检验效能,并且聚类和检验在全基因组病例对照研究中处理病例对照数据不平衡的罕见变异时有可能成为一种实用工具。