全基因组关联研究中H1模拟的替代方法。

Alternative methods for H1 simulations in genome-wide association studies.

作者信息

Perduca V, Sinoquet C, Mourad R, Nuel G

机构信息

MAP5 - UMR CNRS 8145, Université Paris Descartes, Paris, France. vittorio.perduca @ parisdescartes.fr

出版信息

Hum Hered. 2012;73(2):95-104. doi: 10.1159/000336194. Epub 2012 Mar 28.

DOI:10.1159/000336194

PMID:22472690

Abstract

OBJECTIVE

Assessing the statistical power to detect susceptibility variants plays a critical role in genome-wide association (GWA) studies both from the prospective and retrospective point of view. Power is empirically estimated by simulating phenotypes under a disease model H1. For this purpose, the gold standard consists in simulating genotypes given the phenotypes (e.g. Hapgen). We introduce here an alternative approach for simulating phenotypes under H1 that does not require generating new genotypes for each simulation.

METHODS

In order to simulate phenotypes with a fixed total number of cases and under a given disease model, we suggest 3 algorithms: (1) a simple rejection algorithm; (2) a numerical Markov chain Monte-Carlo (MCMC) approach, and (3) an exact and efficient backward sampling algorithm. In our study, we validated the 3 algorithms both on a simulated dataset and by comparing them with Hapgen on a more realistic dataset. For an application, we then conducted a simulation study on a 1000 Genomes Project dataset consisting of 629 individuals (314 cases) and 8,048 SNPs from chromosome X. We arbitrarily defined an additive disease model with two susceptibility SNPs and an epistatic effect.

RESULTS

The 3 algorithms are consistent, but backward sampling is dramatically faster than the other two. Our approach also gives consistent results with Hapgen. Using our application data, we showed that our limited design requires a biological a priori to limit the investigated region. We also proved that epistatic effects can play a significant role even when simple marker statistics (e.g. trend) are used. We finally showed that the overall performance of a GWA study strongly depends on the prevalence of the disease: the larger the prevalence, the better the power.

CONCLUSIONS

Our approach is a valid alternative to Hapgen-type methods; it is not only dramatically faster but has 2 main advantages: (1) there is no need for sophisticated genotype models (e.g. haplotype frequencies, or recombination rates), and (2) the choice of the disease model is completely unconstrained (number of SNPs involved, gene-environment interactions, hybrid genetic models, etc.). Our 3 algorithms are available in an R package called 'waffect' ('double-u affect', for weighted affectations).

摘要

目的

从前瞻性和回顾性角度来看，评估检测易感性变异的统计效能在全基因组关联（GWA）研究中都起着关键作用。效能是通过在疾病模型H1下模拟表型进行经验性估计的。为此，金标准是根据表型模拟基因型（例如Hapgen）。我们在此介绍一种在H1下模拟表型的替代方法，该方法无需为每次模拟生成新的基因型。

方法

为了在固定病例总数且给定疾病模型下模拟表型，我们提出了3种算法：（1）一种简单的拒绝算法；（2）一种数值马尔可夫链蒙特卡罗（MCMC）方法，以及（3）一种精确且高效的反向抽样算法。在我们的研究中，我们在模拟数据集上以及通过在更现实的数据集上与Hapgen进行比较来验证这3种算法。作为应用，我们随后对一个由629个个体（314例病例）和来自X染色体的8048个单核苷酸多态性（SNP）组成的千人基因组计划数据集进行了模拟研究。我们任意定义了一个具有两个易感性SNP和上位效应的加性疾病模型。

结果

这3种算法是一致的，但反向抽样比其他两种算法快得多。我们的方法与Hapgen也给出了一致的结果。使用我们的应用数据，我们表明我们有限的设计需要一个生物学先验来限制研究区域。我们还证明即使使用简单的标记统计量（例如趋势），上位效应也可能起重要作用。我们最终表明GWA研究的总体效能强烈依赖于疾病的患病率：患病率越高，效能越好。

结论

我们的方法是Hapgen型方法的有效替代方法；它不仅显著更快，而且有两个主要优点：（1）无需复杂的基因型模型（例如单倍型频率或重组率），以及（2）疾病模型的选择完全不受限制（涉及的SNP数量、基因 - 环境相互作用、混合遗传模型等）。我们的3种算法可在一个名为“waffect”（“双u影响”，用于加权分配）的R包中获取。