National Institute of Biomedical Genomics, Kalyani, India.
Genet Epidemiol. 2020 Nov;44(8):841-853. doi: 10.1002/gepi.22345. Epub 2020 Aug 10.
Many variants with low frequencies or with low to modest effects likely remain unidentified in genome-wide association studies (GWAS) because of stringent genome-wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene-level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTest-Genomic Knowledge-guided Multiplte Testing) to enable a prioritized pathway-guided GWAS scan with a very large number of gene-level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome-wide summary level data and a user-specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as "adaptively learned" from the data using a cross-validation technique to avoid overfitting. We used whole-genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user-friendly open-source R package.
许多低频或低至中等效应的变体可能由于全基因组检测的严格阈值而在全基因组关联研究 (GWAS) 中未被识别。为了提高检测能力,已经成功地使用基于功能注释和表观遗传标记的变体优先级排序。在这里,我们提出了一种利用基因水平知识(例如,途径和本体注释)对 GWAS 进行优先级排序的新方法,并表明它进一步提高了检测能力。通常,与疾病相关的变体位于与疾病过程相关的特定生物学途径中共同涉及的基因附近。利用这些知识进行优先扫描可以提高检测映射到少数特定途径中聚集的基因的基因座的能力。我们已经开发了一种基于惩罚逻辑回归的计算上可扩展的框架(称为 GKnowMTest-基因组知识引导的多重测试),以实现具有大量基因水平注释的优先级途径引导的 GWAS 扫描。我们证明了所提出的策略可以提高整体检测能力,并在全局范围内保持 Type 1 错误。我们的方法适用于全基因组汇总水平数据和用户指定的途径列表(例如,从没有特定疾病生物学参考的大型途径数据库中提取的途径)。它通过使用交叉验证技术自动重新加权输入 p 值,将途径富集作为“从数据中自适应学习”,以避免过度拟合。我们使用全基因组模拟和一些公开可用的 GWAS 数据集来说明我们方法的应用。GKnowMTest 框架已作为用户友好的开源 R 包实现。