Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Pac Symp Biocomput. 2022;27:325-336.
The polygenic risk score (PRS) can help to identify individuals' genetic susceptibility for various diseases by combining patient genetic profiles and identified single-nucleotide polymorphisms (SNPs) from genome-wide association studies. Although multiple diseases will usually afflict patients at once or in succession, conventional PRSs fail to consider genetic relationships across multiple diseases. Even multi-trait PRSs, which take into account genetic effects for more than one disease at a time, fail to consider a sufficient number of phenotypes to accurately reflect the state of disease comorbidity in a patient, or are biased in terms of the traits that are selected. Thus, we developed novel network-based comorbidity risk scores to quantify associations among multiple phenotypes from phenome-wide association studies (PheWAS). We first constructed a disease-SNP heterogeneous multi-layered network (DS-Net), which consists of a disease network (disease-layer) and SNP network (SNP-layer). The disease-layer describes the population-level interactome from PheWAS data. The SNP-layer was constructed according to linkage disequilibrium. Both layers were attached to transform the information from a population-level interactome to individual-level inferences. Then, graph-based semi-supervised learning was applied to predict possible comorbidity scores on disease-layer for each subject. The SNP-layer serves as receiving individual genotyping data in the scoring process, and the disease-layer serves as the propagated output for an individual's multiple disease comorbidity scores. The possible comorbidity scores were combined by logistic regression, and it is denoted as netCRS. The DS-Net was constructed from UK Biobank PheWAS data, and the individual genetic profiles were collected from the Penn Medicine Biobank. As a proof-of-concept study, myocardial infarction (MI) was selected to compare netCRS with the PRS with pruning and thresholding (PRS-PT). The combined model (netCRS + PRS-PT + covariates) achieved an AUC improvement of 6.26% compared to the (PRS-PT + covariates) model. In terms of risk stratification, the combined model was able to capture the risk of MI up to approximately eight-fold higher than that of the low-risk group. The netCRS and PRS-PT complement each other in predicting high-risk groups of patients with MI. We expect that using these risk prediction models will allow for the development of prevention strategies and reduction of MI morbidity and mortality.
多基因风险评分(PRS)可以通过结合患者的遗传特征和全基因组关联研究中确定的单核苷酸多态性(SNP)来帮助识别个体对各种疾病的遗传易感性。尽管多种疾病通常会同时或相继影响患者,但传统的 PRS 未能考虑多种疾病之间的遗传关系。即使是同时考虑一种以上疾病遗传效应的多性状 PRS,也未能考虑到足够多的表型来准确反映患者疾病共病的状态,或者在选择的性状方面存在偏差。因此,我们开发了基于网络的新型共病风险评分,以量化来自全表型关联研究(PheWAS)的多种表型之间的关联。我们首先构建了疾病-SNP 异质多层网络(DS-Net),它由疾病网络(疾病层)和 SNP 网络(SNP 层)组成。疾病层描述了来自 PheWAS 数据的人群水平相互作用组。SNP 层是根据连锁不平衡构建的。这两层都用于将信息从人群水平的相互作用组转换为个体水平的推断。然后,应用基于图的半监督学习来预测每个个体疾病层上可能的共病评分。SNP 层在评分过程中作为个体基因分型数据的接收者,疾病层作为个体多种疾病共病评分的传播输出。可能的共病评分通过逻辑回归组合,并用 netCRS 表示。DS-Net 是从英国生物库 PheWAS 数据中构建的,个体遗传特征是从宾夕法尼亚大学医学生物库收集的。作为概念验证研究,选择心肌梗死(MI)来比较 netCRS 与修剪和阈值 PRS(PRS-PT)。与(PRS-PT+协变量)模型相比,组合模型(netCRS+PRS-PT+协变量)的 AUC 提高了 6.26%。在风险分层方面,该组合模型能够捕捉到 MI 风险,其风险比低危组高约 8 倍。netCRS 和 PRS-PT 在预测 MI 高危患者方面相互补充。我们期望使用这些风险预测模型可以制定预防策略,降低 MI 的发病率和死亡率。