Shenzhen Research Institute of Big Data, Shenzhen 518172, China.
Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad068.
The findings from genome-wide association studies (GWASs) have greatly helped us to understand the genetic basis of human complex traits and diseases. Despite the tremendous progress, much effects are still needed to address several major challenges arising in GWAS. First, most GWAS hits are located in the non-coding region of human genome, and thus their biological functions largely remain unknown. Second, due to the polygenicity of human complex traits and diseases, many genetic risk variants with weak or moderate effects have not been identified yet.
To address the above challenges, we propose a powerful and adaptive latent model (PALM) to integrate cell-type/tissue-specific functional annotations with GWAS summary statistics. Unlike existing methods, which are mainly based on linear models, PALM leverages a tree ensemble to adaptively characterize non-linear relationship between functional annotations and the association status of genetic variants. To make PALM scalable to millions of variants and hundreds of functional annotations, we develop a functional gradient-based expectation-maximization algorithm, to fit the tree-based non-linear model in a stable manner. Through comprehensive simulation studies, we show that PALM not only controls false discovery rate well, but also improves statistical power of identifying risk variants. We also apply PALM to integrate summary statistics of 30 GWASs with 127 cell type/tissue-specific functional annotations. The results indicate that PALM can identify more risk variants as well as rank the importance of functional annotations, yielding better interpretation of GWAS results.
The source code is available at https://github.com/YangLabHKUST/PALM.
Supplementary data are available at Bioinformatics online.
全基因组关联研究 (GWAS) 的发现极大地帮助我们理解了人类复杂特征和疾病的遗传基础。尽管取得了巨大的进展,但仍需要解决 GWAS 中出现的几个主要挑战。首先,大多数 GWAS 命中都位于人类基因组的非编码区域,因此它们的生物学功能在很大程度上仍然未知。其次,由于人类复杂特征和疾病的多基因性,许多遗传风险变体具有较弱或中等效应尚未被识别。
为了解决上述挑战,我们提出了一种强大的自适应潜在模型 (PALM),将细胞类型/组织特异性功能注释与 GWAS 汇总统计数据集成在一起。与主要基于线性模型的现有方法不同,PALM 利用树集成自适应地描述功能注释和遗传变异关联状态之间的非线性关系。为了使 PALM 能够扩展到数百万个变体和数百个功能注释,我们开发了一种基于功能梯度的期望最大化算法,以稳定地拟合基于树的非线性模型。通过全面的模拟研究,我们表明 PALM 不仅可以很好地控制假发现率,而且还可以提高识别风险变体的统计能力。我们还应用 PALM 将 30 项 GWAS 的汇总统计数据与 127 种细胞类型/组织特异性功能注释集成在一起。结果表明,PALM 不仅可以识别更多的风险变体,还可以对功能注释的重要性进行排名,从而更好地解释 GWAS 结果。
源代码可在 https://github.com/YangLabHKUST/PALM 上获得。
补充数据可在生物信息学在线获得。