Nock Nl, Zhang Lx
Department of Epidemiology and Biostatistics, Case Western Reserve University, 2103 Cornell Road, Cleveland, OH 44106-7281, USA.
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S47. doi: 10.1186/1753-6561-5-S9-S47.
Methods that can evaluate aggregate effects of rare and common variants are limited. Therefore, we applied a two-stage approach to evaluate aggregate gene effects in the 1000 Genomes Project data, which contain 24,487 single-nucleotide polymorphisms (SNPs) in 697 unrelated individuals from 7 populations. In stage 1, we identified potentially interesting genes (PIGs) as those having at least one SNP meeting Bonferroni correction using univariate, multiple regression models. In stage 2, we evaluate aggregate PIG effects on trait, Q1, by modeling each gene as a latent construct, which is defined by multiple common and rare variants, using the multivariate statistical framework of structural equation modeling (SEM). In stage 1, we found that PIGs varied markedly between a randomly selected replicate (replicate 137) and 100 other replicates, with the exception of FLT1. In stage 1, collapsing rare variants decreased false positives but increased false negatives. In stage 2, we developed a good-fitting SEM model that included all nine genes simulated to affect Q1 (FLT1, KDR, ARNT, ELAV4, FLT4, HIF1A, HIF3A, VEGFA, VEGFC) and found that FLT1 had the largest effect on Q1 (βstd = 0.33 ± 0.05). Using replicate 137 estimates as population values, we found that the mean relative bias in the parameters (loadings, paths, residuals) and their standard errors across 100 replicates was on average, less than 5%. Our latent variable SEM approach provides a viable framework for modeling aggregate effects of rare and common variants in multiple genes, but more elegant methods are needed in stage 1 to minimize type I and type II error.
能够评估罕见和常见变异总体效应的方法有限。因此,我们采用两阶段方法在千人基因组计划数据中评估总体基因效应,该数据包含来自7个群体的697名无关个体中的24487个单核苷酸多态性(SNP)。在第一阶段,我们将潜在有趣基因(PIG)定义为使用单变量、多元回归模型至少有一个SNP符合Bonferroni校正的基因。在第二阶段,我们通过使用结构方程模型(SEM)的多元统计框架,将每个基因建模为一个由多个常见和罕见变异定义的潜在结构,来评估PIG对性状Q1的总体效应。在第一阶段,我们发现除FLT1外,在随机选择的一个重复样本(重复样本137)和其他100个重复样本之间,PIG有显著差异。在第一阶段,合并罕见变异减少了假阳性,但增加了假阴性。在第二阶段,我们开发了一个拟合良好的SEM模型,该模型包括模拟影响Q1的所有九个基因(FLT1、KDR、ARNT、ELAV4、FLT4、HIF1A、HIF3A、VEGFA、VEGFC),并发现FLT1对Q1的影响最大(βstd = 0.33 ± 0.05)。使用重复样本137的估计值作为总体值,我们发现在100个重复样本中,参数(负荷、路径、残差)及其标准误差的平均相对偏差平均小于5%。我们的潜在变量SEM方法为建模多个基因中罕见和常见变异的总体效应提供了一个可行的框架,但在第一阶段需要更完善的方法来最小化I型和II型错误。