Li Meijuan, Reilly Cavan, Hanson Timothy
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455-0378, USA.
Bioinformatics. 2008 Oct 15;24(20):2356-62. doi: 10.1093/bioinformatics/btn455. Epub 2008 Aug 27.
Although population-based association mapping may be subject to the bias caused by population stratification, alternative methods that are robust to population stratification such as family-based linkage analysis have lower mapping resolution. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate genes and traits of interest. The association between a candidate gene and a quantitative trait is often evaluated via a regression model with inferred population structure variables as covariates, where the residual distribution is customarily assumed to be from a symmetric and unimodal parametric family, such as a Gaussian, although this may be inappropriate for the analysis of many real-life datasets.
In this article, we proposed a new structured association (SA) test. Our method corrects for continuous population stratification by first deriving population structure and kinship matrices through a set of random genetic markers and then modeling the relationship between trait values, genotypic scores at a candidate marker and genetic background variables through a semiparametric model, where the error distribution is modeled as a mixture of Polya trees centered around a normal family of distributions. We compared our model to the existing SA tests in terms of model fit, type I error rate, power, precision and accuracy by application to a real dataset as well as simulated datasets.
尽管基于群体的关联作图可能会受到群体分层导致的偏差影响,但诸如基于家系的连锁分析等对群体分层具有稳健性的替代方法,其作图分辨率较低。最近,针对关联研究提出了各种对群体分层具有稳健性的统计方法,利用无关个体来识别候选基因与感兴趣性状之间的关联。候选基因与数量性状之间的关联通常通过一个回归模型来评估,该模型将推断出的群体结构变量作为协变量,其中通常假定残差分布来自一个对称且单峰的参数族,比如高斯分布,尽管这可能不适用于许多实际数据集的分析。
在本文中,我们提出了一种新的结构化关联(SA)检验方法。我们的方法通过首先利用一组随机遗传标记推导群体结构和亲缘关系矩阵,然后通过一个半参数模型对性状值、候选标记处的基因型得分与遗传背景变量之间的关系进行建模,从而校正连续的群体分层,其中误差分布被建模为以正态分布族为中心的波利亚树的混合。我们通过应用于一个真实数据集以及模拟数据集,在模型拟合、I型错误率、功效、精度和准确性方面,将我们的模型与现有的SA检验进行了比较。