Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark.
The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
Nucleic Acids Res. 2023 Jul 7;51(12):e67. doi: 10.1093/nar/gkad373.
Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.
多基因风险评分 (PRSs) 有望在精准医学中发挥关键作用。目前,PRS 预测因子通常基于使用汇总统计数据的线性模型,以及最近的个体水平数据。然而,这些预测因子主要捕捉加性关系,并且在它们可以使用的数据模式方面受到限制。我们开发了一种用于 PRS 预测的深度学习框架 (EIR),该框架包括一个专门为大规模基因组学数据设计的模型,即基因组局部网络 (GLN)。该框架支持多任务学习、自动整合其他临床和生化数据以及模型可解释性。当应用于来自英国生物库的个体水平数据时,GLN 模型与已建立的神经网络架构相比表现出有竞争力的性能,特别是对于某些特征,展示了其在建模复杂遗传关系方面的潜力。此外,GLN 模型在 1 型糖尿病方面优于线性 PRS 方法,这可能是由于对非加性遗传效应和上位性的建模。我们在 1 型糖尿病的背景下广泛识别非加性遗传效应和上位性,这为我们提供了支持。最后,我们构建了整合基因型、血液、尿液和人体测量数据的 PRS 模型,发现这提高了 93%所考虑的 290 种疾病和障碍的性能。EIR 可在 https://github.com/arnor-sigurdsson/EIR 上获得。