贝叶斯线性回归模型在复杂疾病基因集优先级中的评价。

Evaluation of Bayesian Linear Regression models for gene set prioritization in complex diseases.

机构信息

Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.

Department of Biomedicine, Aarhus University, Aarhus, Denmark.

出版信息

PLoS Genet. 2024 Nov 4;20(11):e1011463. doi: 10.1371/journal.pgen.1011463. eCollection 2024 Nov.

DOI:10.1371/journal.pgen.1011463

PMID:39495786

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11563439/

Abstract

Genome-wide association studies (GWAS) provide valuable insights into the genetic architecture of complex traits, yet interpreting their results remains challenging due to the polygenic nature of most traits. Gene set analysis offers a solution by aggregating genetic variants into biologically relevant pathways, enhancing the detection of coordinated effects across multiple genes. In this study, we present and evaluate a gene set prioritization approach utilizing Bayesian Linear Regression (BLR) models to uncover shared genetic components among different phenotypes and facilitate biological interpretation. Through extensive simulations and analyses of real traits, we demonstrate the efficacy of the BLR model in prioritizing pathways for complex traits. Simulation studies reveal insights into the model's performance under various scenarios, highlighting the impact of factors such as the number of causal genes, proportions of causal variants, heritability, and disease prevalence. Comparative analyses with MAGMA (Multi-marker Analysis of GenoMic Annotation) demonstrate BLR's superior performance, especially in highly overlapped gene sets. Application of both single-trait and multi-trait BLR models to real data, specifically GWAS summary data for type 2 diabetes (T2D) and related phenotypes, identifies significant associations with T2D-related pathways. Furthermore, comparison between single- and multi-trait BLR analyses highlights the superior performance of the multi-trait approach in identifying associated pathways, showcasing increased statistical power when analyzing multiple traits jointly. Additionally, enrichment analysis with integrated data from various public resources supports our results, confirming significant enrichment of diabetes-related genes within the top T2D pathways resulting from the multi-trait analysis. The BLR model's ability to handle diverse genomic features, perform regularization, conduct variable selection, and integrate information from multiple traits, genders, and ancestries demonstrates its utility in understanding the genetic architecture of complex traits. Our study provides insights into the potential of the BLR model to prioritize gene sets, offering a flexible framework applicable to various datasets. This model presents opportunities for advancing personalized medicine by exploring the genetic underpinnings of multifactorial traits.

摘要

全基因组关联研究（GWAS）为复杂性状的遗传结构提供了有价值的见解，但由于大多数性状的多基因性质，解释其结果仍然具有挑战性。基因集分析通过将遗传变异聚集到生物相关的途径中，增强了对多个基因之间协调效应的检测，提供了一种解决方案。在这项研究中，我们提出并评估了一种利用贝叶斯线性回归（BLR）模型对不同表型之间共享遗传成分进行优先排序的基因集方法，并促进了生物学解释。通过对真实性状的广泛模拟和分析，我们证明了 BLR 模型在优先排序复杂性状途径方面的有效性。模拟研究揭示了模型在各种情况下的性能洞察，强调了因果基因数量、因果变异比例、遗传率和疾病患病率等因素的影响。与 MAGMA（多标记分析基因注释）的比较分析表明，BLR 具有优越的性能，特别是在高度重叠的基因集中。将单性状和多性状 BLR 模型应用于真实数据，特别是 2 型糖尿病（T2D）和相关表型的 GWAS 汇总数据，确定了与 T2D 相关途径的显著关联。此外，单性状和多性状 BLR 分析之间的比较突出了多性状分析在识别相关途径方面的优越性能，表明在联合分析多个性状时具有更高的统计能力。此外，利用来自各种公共资源的综合数据进行富集分析支持了我们的结果，证实了多性状分析中糖尿病相关基因在顶级 T2D 途径中的显著富集。BLR 模型能够处理多种基因组特征、进行正则化、进行变量选择以及整合来自多个性状、性别和祖源的信息，展示了其在理解复杂性状遗传结构方面的实用性。我们的研究为 BLR 模型在优先排序基因集方面的潜力提供了见解，为各种数据集提供了一个灵活的框架。该模型为探索多因素性状的遗传基础，提供了推进个性化医学的机会。