Kuo Po-Hsiu, Bukszár József, van den Oord Edwin J C G
Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, 800 East Leigh Street, Biotech 1, VIPBG, Suite 1-130, Richmond, Virginia 23219, USA.
BMC Proc. 2007;1 Suppl 1(Suppl 1):S143. doi: 10.1186/1753-6561-1-S1-S143. Epub 2007 Dec 18.
It has recently become possible to screen thousands of markers to detect genetic causes of common diseases. Along with this potential comes analytical challenges, and it is important to develop new statistical tools to identify markers with causal effects and accurately estimate their effect sizes. Knowledge of the proportion of markers without true effects (p0) and the effect sizes of markers with effects provides information to control for false discoveries and to design follow-up studies. We apply newly developed methods to simulated Genetic Analysis Workshop 15 genome-wide case-control data sets, including a maximum likelihood (ML) and a quasi-ML (QML) approach that incorporate the test statistic distribution and estimates effect size simultaneously with p0, and two conservative estimators of p0 that do not rely on the test statistic distribution under the alternative. Compared with four existing commonly used estimators for p0, our results illustrated that all of our estimators have favorable properties in terms of the standard deviation with which p0 is estimated. On average, the ML method performed slightly better than the QML method; the conservative method performed well and was even slightly more precise than the ML estimators, and can be more robust in less optimal conditions (small sample sizes and small number of markers). Further improvements and extensions of the proposed methods are conceivable, such as estimating the distribution of effect sizes and taking population stratification into account when obtain estimates of p0 and effect size.
最近,筛查数千个标记物以检测常见疾病的遗传病因已成为可能。伴随这种可能性而来的是分析上的挑战,开发新的统计工具以识别具有因果效应的标记物并准确估计其效应大小非常重要。了解无真实效应的标记物比例(p0)以及有效应的标记物的效应大小,可为控制错误发现和设计后续研究提供信息。我们将新开发的方法应用于模拟的遗传分析研讨会15全基因组病例对照数据集,包括一种最大似然(ML)方法和一种拟最大似然(QML)方法,这两种方法结合了检验统计量分布并同时估计效应大小和p0,还有两种不依赖于备择假设下检验统计量分布的p0保守估计方法。与现有的四种常用p0估计方法相比,我们的结果表明,就p0估计的标准差而言,我们所有的估计方法都具有良好的性质。平均而言,ML方法的表现略优于QML方法;保守方法表现良好,甚至比ML估计方法更精确,并且在不太理想的条件下(小样本量和少量标记物)可能更稳健。可以想象对所提出方法的进一步改进和扩展,例如估计效应大小的分布以及在获得p0和效应大小估计值时考虑群体分层。