Li Xiaoqi, Kharitonova Elena, Pang Minxing, Wen Jia, Zhou Laura Y, Raffield Laura, Zhou Haibo, Yao Huaxiu, Chen Can, Li Yun, Sun Quan
Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA.
Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
HGG Adv. 2025 Aug 8;6(4):100490. doi: 10.1016/j.xhgg.2025.100490.
Genetic prediction of complex traits, enabled by large-scale genomic studies, has created new measures to understand individual genetic predisposition. Polygenic risk scores (PRSs) offer a way to aggregate information across the genome, enabling personalized risk prediction for complex traits and diseases. However, conventional PRS calculation methods that rely on linear models are limited in their ability to capture complex patterns and interaction effects in high-dimensional genomic data. In this study, we seek to improve the predictive power of PRS through applying advanced deep learning techniques. We show that the variational autoencoder-based model for PRS construction (VAE-PRS) outperforms currently state-of-the-art methods for biobank-level data in 14 out of 16 blood cell traits, while being computationally efficient. Through comprehensive experiments, we found that the VAE-PRS model offers the ability to capture interaction effects in high-dimensional data and shows robust performance across different pre-screened variant sets. Furthermore, VAE-PRS is easily interpretable via assessing the contribution of each individual marker to the final prediction score through the Shapley additive explanations method, providing potential new insights in identifying trait-associated genetic variants. In summary, VAE-PRS presents a measure to genetic risk prediction for blood cell traits by harnessing the power of deep learning methods given appropriate training sample size, which could further facilitate the development of personalized medicine and genetic research.
大规模基因组研究实现的复杂性状的遗传预测,创造了理解个体遗传易感性的新方法。多基因风险评分(PRSs)提供了一种整合全基因组信息的方式,能够对复杂性状和疾病进行个性化风险预测。然而,依赖线性模型的传统PRS计算方法在捕捉高维基因组数据中的复杂模式和相互作用效应方面能力有限。在本研究中,我们试图通过应用先进的深度学习技术来提高PRS的预测能力。我们表明,基于变分自编码器的PRS构建模型(VAE-PRS)在16种血细胞性状中的14种上,优于目前用于生物样本库级数据的最先进方法,同时计算效率高。通过全面的实验,我们发现VAE-PRS模型能够捕捉高维数据中的相互作用效应,并且在不同的预筛选变异集上表现出稳健的性能。此外,通过Shapley加性解释方法评估每个个体标记对最终预测分数的贡献,VAE-PRS很容易解释,为识别性状相关的遗传变异提供了潜在的新见解。总之,在有适当训练样本量的情况下,VAE-PRS通过利用深度学习方法的力量,为血细胞性状的遗传风险预测提供了一种方法,这可能进一步促进个性化医学和遗传研究的发展。