Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.
Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.
PLoS Genet. 2019 Jun 13;15(6):e1008202. doi: 10.1371/journal.pgen.1008202. eCollection 2019 Jun.
Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.
多基因风险评分 (PRS) 旨在作为易于构建的单一综合指标,用于浓缩与疾病相关的大量遗传变异信息。它们已被用于疾病风险的分层和预测。本文的主要重点是展示如何将 PRS 和电子健康记录数据相结合,以更好地理解疾病亚型的共享和独特遗传结构和病因,这些亚型可能既有相关性又有异质性。PRS 构建策略通常取决于研究目的、可用数据/汇总估计值以及疾病的潜在遗传结构。我们考虑了使用从各种公开来源(包括英国生物库)获得的数据构建 PRS 的几种选择,并评估了它们不仅预测主要表型而且预测来自电子健康记录 (EHR) 的次要表型的能力。本研究使用来自密歇根医学中密歇根基因组倡议 (MGI) 的 30,702 名最近欧洲血统的无关、基因分型患者的数据进行。我们检查了美国最常见的三种皮肤癌亚型:基底细胞癌、皮肤鳞状细胞癌和黑色素瘤。使用这些用于各种皮肤癌亚型的 PRS,我们在 MGI 数据中进行了全表型关联研究 (PheWAS),以评估 PRS 与次要特征的关联。然后使用基于人群的英国生物库数据复制 PheWAS 结果,并比较各种 PRS 构建方法。我们开发了一个名为 PRSweb 的配套可视化目录,该目录提供了详细的 PheWAS 结果,并允许用户直接比较不同的 PRS 构建方法。