全基因组关联研究的通用错误发现率估计方法。

Universal false discovery rate estimation methodology for genome-wide association studies.

作者信息

Forner Karl, Lamarine Marc, Guedj Mickaël, Dauvillier Jérôme, Wojcik Jérôme

机构信息

Merck Serono, Geneva Research Center, Geneva, Switzerland.

出版信息

Hum Hered. 2008;65(4):183-94. doi: 10.1159/000112365. Epub 2007 Dec 11.

DOI:10.1159/000112365

PMID:18073488

Abstract

Genome-wide case-control association studies aim at identifying significant differential markers between sick and healthy populations. With the development of large-scale technologies allowing the genotyping of thousands of single nucleotide polymorphisms (SNPs) comes the multiple testing problem and the practical issue of selecting the most probable set of associated markers. Several False Discovery Rate (FDR) estimation methods have been developed and tuned mainly for differential gene expression studies. However they are based on hypotheses and designs that are not necessarily relevant in genetic association studies. In this article we present a universal methodology to estimate the FDR of genome-wide association results. It uses a single global probability value per SNP and is applicable in practice for any study design, using any statistic. We have benchmarked this algorithm on simulated data and shown that it outperforms previous methods in cases requiring non-parametric estimation. We exemplified the usefulness of the method by applying it to the analysis of experimental genotyping data of three Multiple Sclerosis case-control association studies.

摘要

全基因组病例对照关联研究旨在识别患病群体和健康群体之间的显著差异标记。随着能够对数千个单核苷酸多态性（SNP）进行基因分型的大规模技术的发展，出现了多重检验问题以及选择最可能的相关标记集的实际问题。已经开发并调整了几种错误发现率（FDR）估计方法，主要用于差异基因表达研究。然而，它们基于的假设和设计在基因关联研究中不一定适用。在本文中，我们提出了一种估计全基因组关联结果FDR的通用方法。它为每个SNP使用单个全局概率值，并且在实践中适用于任何研究设计，使用任何统计量。我们在模拟数据上对该算法进行了基准测试，结果表明在需要非参数估计的情况下，它优于以前的方法。我们通过将其应用于三项多发性硬化症病例对照关联研究的实验基因分型数据分析，例证了该方法的实用性。