Department of Biochemistry, The University of Hong Kong, Hong Kong, China.
PLoS One. 2010 Dec 31;5(12):e14480. doi: 10.1371/journal.pone.0014480.
We are moving to second-wave analysis of genome-wide association studies (GWAS), characterized by comprehensive bioinformatical and statistical evaluation of genetic associations. Existing biological knowledge is very valuable for GWAS, which may help improve their detection power particularly for disease susceptibility loci of moderate effect size. However, a challenging question is how to utilize available resources that are very heterogeneous to quantitatively evaluate the statistic significances.
METHODOLOGY/PRINCIPAL FINDINGS: We present a novel knowledge-based weighting framework to boost power of the GWAS and insightfully strengthen their explorative performance for follow-up replication and deep sequencing. Built upon diverse integrated biological knowledge, this framework directly models both the prior functional information and the association significances emerging from GWAS to optimally highlight single nucleotide polymorphisms (SNPs) for subsequent replication. In the theoretical calculation and computer simulation, it shows great potential to achieve extra over 15% power to identify an association signal of moderate strength or to use hundreds of whole-genome subjects fewer to approach similar power. In a case study on late-onset Alzheimer disease (LOAD) for a proof of principle, it highlighted some genes, which showed positive association with LOAD in previous independent studies, and two important LOAD related pathways. These genes and pathways could be originally ignored due to involved SNPs only having moderate association significance.
CONCLUSIONS/SIGNIFICANCE: With user-friendly implementation in an open-source Java package, this powerful framework will provide an important complementary solution to identify more true susceptibility loci with modest or even small effect size in current GWAS for complex diseases.
我们正在进行全基因组关联研究(GWAS)的第二波分析,其特点是对遗传关联进行全面的生物信息学和统计评估。现有的生物学知识对于 GWAS 非常有价值,它可以帮助提高它们的检测能力,特别是对于中等效应大小的疾病易感性位点。然而,一个具有挑战性的问题是如何利用非常异构的可用资源来定量评估统计显著性。
方法/主要发现:我们提出了一种新的基于知识的加权框架,以提高 GWAS 的功效,并明智地增强其探索性性能,以进行后续的复制和深度测序。该框架建立在多样化的综合生物学知识基础上,直接对先验功能信息和 GWAS 中出现的关联显著性进行建模,以优化 SNP 的选择,用于后续的复制。在理论计算和计算机模拟中,它显示出了巨大的潜力,可以获得超过 15%的额外功效,以识别中等强度的关联信号,或者使用数百个全基因组个体来获得相似的功效。在一个关于迟发性阿尔茨海默病(LOAD)的案例研究中,它突出了一些基因,这些基因在之前的独立研究中与 LOAD 呈正相关,以及两个重要的 LOAD 相关途径。由于涉及的 SNP 只有中等关联显著性,这些基因和途径最初可能会被忽略。
结论/意义:该强大的框架具有用户友好的开源 Java 包实现,为当前复杂疾病的 GWAS 提供了一种重要的补充解决方案,可以识别更多具有适度甚至较小效应大小的真正易感性位点。