一种用于从多个队列的病例对照样本中进行遗传关联研究的强大而有效的统计方法。

A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts.

机构信息

Department of Biostatistics and Computational Biology, State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, 200433, Shanghai, China.

出版信息

BMC Genomics. 2013 Feb 8;14:88. doi: 10.1186/1471-2164-14-88.

DOI:10.1186/1471-2164-14-88

PMID:23394771

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3626840/

Abstract

BACKGROUND

The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case-control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc.

RESULTS

We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson's disease (PD) case-control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk.

CONCLUSIONS

We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS.

摘要

背景

全基因组关联研究（GWAS）的理论基础是对任何多态性标记与假定疾病基因座之间的连锁不平衡（LD）进行统计推断。目前广泛应用于此类分析的大多数方法都容易受到几个关键人口统计学因素的影响，导致对真实关联的检测统计功效差，假阳性率高。在这里，我们提出了一种基于似然的统计方法，该方法正确考虑了研究人群中基因座基因型分布的病例对照样本的非随机性质，并赋予了在存在不同混杂因素（如群体结构、样本非随机性等）时测试遗传关联的灵活性。

结果

我们将这种新方法与 GWAS 文献中的几种流行方法一起，重新分析了最近发表的帕金森病（PD）病例对照样本。实际数据分析和计算机模拟表明，与竞争对手相比，新方法不仅显著提高了检测关联的统计功效，而且在处理非随机采样和遗传结构带来的困难时具有稳健性。特别是，新方法在 25 个大小小于 1Mb 的染色体区域内检测到 44 个显著 SNP，而基于趋势检验的方法之前仅在其中两个区域检测到 6 个 SNP。它发现了两个位于 PD 候选物 FGF20 和 PARK8 附近 1.18Mb 和 0.18Mb 的 SNP，没有引入假阳性风险。

结论

我们开发了一种新的基于似然的方法，该方法通过使用病例和对照样本对 LD 和其他群体模型参数进行充分估计，方便整合来自多个遗传上不同的群体的这些样本，从而对 GWAS 进行具有统计学稳健性和强大的分析。基于模拟研究和真实数据集的分析，我们证明了新方法在非参数趋势检验方面有显著改进，非参数趋势检验是 GWAS 文献中最常用的方法。