Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA.
Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA.
Bioinformatics. 2022 Jan 27;38(4):1067-1074. doi: 10.1093/bioinformatics/btab802.
In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address these challenges, we propose GPA-Tree and it simultaneously implements association mapping and identifies key combinations of functional annotations related to risk-associated SNPs by combining a decision tree algorithm with a hierarchical modeling framework.
First, we implemented simulation studies to evaluate the proposed GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs and identifying the true combinations of functional annotations with high accuracy. Second, we applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells in SLE. These results demonstrate that GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.
The GPATree software is available at https://dongjunchung.github.io/GPATree/.
Supplementary data are available at Bioinformatics online.
尽管全基因组关联研究(GWAS)取得了巨大成功,但仍存在多个挑战。首先,复杂性状通常与许多单核苷酸多态性(SNPs)相关,每个 SNP 的作用大小都较小或中等。其次,我们对遗传变异与复杂性状相关的功能机制的理解仍然有限。为了解决这些挑战,我们提出了 GPA-Tree,它通过将决策树算法与分层建模框架相结合,同时实现关联映射,并识别与风险相关的 SNPs 相关的功能注释的关键组合。
首先,我们进行了模拟研究来评估 GPA-Tree 方法,并将其性能与现有的统计方法进行了比较。结果表明,GPA-Tree 在检测风险相关 SNPs 和识别具有高精度的功能注释的真实组合方面优于现有的统计方法。其次,我们将 GPA-Tree 应用于系统性红斑狼疮(SLE)GWAS 和功能注释数据,包括 GenoSkyline 和 GenoSkylinePlus。GPA-Tree 的结果突出了血液免疫细胞的失调,包括但不限于原发性 B 细胞、记忆辅助 T 细胞、调节 T 细胞、中性粒细胞和 CD8+记忆 T 细胞。这些结果表明,GPA-Tree 可以成为一种强大的工具,提高关联映射的同时促进对复杂性状的潜在遗传结构的理解和风险相关 SNPs 与复杂性状之间的潜在机制。
GPATree 软件可在 https://dongjunchung.github.io/GPATree/ 获得。
补充数据可在生物信息学在线获得。