Lacour André, Schüller Vitalia, Drichel Dmitriy, Herold Christine, Jessen Frank, Leber Markus, Maier Wolfgang, Noethen Markus M, Ramirez Alfredo, Vaitsiakhovich Tatsiana, Becker Tim
German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
Abteilung für Psychiatrie und Psychotherapie, Universitätsklinikum Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
BMC Bioinformatics. 2015 Mar 14;16:84. doi: 10.1186/s12859-015-0521-4.
A usually confronted problem in association studies is the occurrence of population stratification. In this work, we propose a novel framework to consider population matchings in the contexts of genome-wide and sequencing association studies. We employ pairwise and groupwise optimal case-control matchings and present an agglomerative hierarchical clustering, both based on a genetic similarity score matrix. In order to ensure that the resulting matches obtained from the matching algorithm capture correctly the population structure, we propose and discuss two stratum validation methods. We also invent a decisive extension to the Cochran-Armitage Trend test to explicitly take into account the particular population structure.
We assess our framework by simulations of genotype data under the null hypothesis, to affirm that it correctly controls for the type-1 error rate. By a power study we evaluate that structured association testing using our framework displays reasonable power. We compare our result with those obtained from a logistic regression model with principal component covariates. Using the principal components approaches we also find a possible false-positive association to Alzheimer's disease, which is neither supported by our new methods, nor by the results of a most recent large meta analysis or by a mixed model approach.
Matching methods provide an alternative handling of confounding due to population stratification for statistical tests for which covariates are hard to model. As a benchmark, we show that our matching framework performs equally well to state of the art models on common variants.
在关联研究中经常遇到的一个问题是群体分层的出现。在这项工作中,我们提出了一个新颖的框架,用于在全基因组和测序关联研究的背景下考虑群体匹配。我们采用成对和分组的最优病例对照匹配,并基于遗传相似性得分矩阵提出了一种凝聚层次聚类方法。为了确保从匹配算法获得的最终匹配能够正确反映群体结构,我们提出并讨论了两种分层验证方法。我们还对 Cochr an - Armitage趋势检验进行了决定性扩展,以明确考虑特定的群体结构。
我们通过在零假设下对基因型数据进行模拟来评估我们的框架,以确认它能正确控制一类错误率。通过功效研究,我们评估使用我们的框架进行结构化关联检验具有合理的功效。我们将我们的结果与从具有主成分协变量的逻辑回归模型获得的结果进行比较。使用主成分方法,我们还发现了与阿尔茨海默病可能的假阳性关联,这既未得到我们新方法的支持,也未得到最近一项大型荟萃分析结果或混合模型方法的支持。
匹配方法为难以对协变量进行建模的统计检验提供了一种处理群体分层导致的混杂因素的替代方法。作为一个基准,我们表明我们的匹配框架在常见变异上的表现与现有最先进模型一样好。