Truong Buu, Hull Leland E, Ruan Yunfeng, Huang Qin Qin, Hornsby Whitney, Martin Hilary, van Heel David A, Wang Ying, Martin Alicia R, Lee S Hong, Natarajan Pradeep
Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142.
Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114.
medRxiv. 2023 Mar 23:2023.02.21.23286110. doi: 10.1101/2023.02.21.23286110.
Polygenic risk scores (PRS) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. Validation and transferability of existing PRS across independent datasets and diverse ancestries are limited, which hinders the practical utility and exacerbates health disparities. We propose PRSmix, a framework that evaluates and leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture. We applied PRSmix to 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% CI: [1.10; 1.3]; P-value = 9.17 × 10) and 1.19-fold (95% CI: [1.11; 1.27]; P-value = 1.92 × 10), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI: [1.40; 2.04]; P-value = 7.58 × 10) and 1.42-fold (95% CI: [1.25; 1.59]; P-value = 8.01 × 10) in European and South Asian ancestries, respectively. Compared to the previously established cross-trait-combination method with scores from pre-defined correlated traits, we demonstrated that our method can improve prediction accuracy for coronary artery disease up to 3.27-fold (95% CI: [2.1; 4.44]; P-value after FDR correction = 2.6 × 10). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
多基因风险评分(PRS)是一种用于预测个体临床表型和结局的新兴工具。现有PRS在独立数据集和不同血统中的验证及可转移性有限,这阻碍了其实际应用并加剧了健康差距。我们提出了PRSmix,这是一个评估和利用目标性状的PRS语料库以提高预测准确性的框架,以及PRSmix+,它纳入了基因相关性状以更好地捕捉人类遗传结构。我们分别将PRSmix应用于欧洲和南亚血统的47种和32种疾病/性状。PRSmix在欧洲和南亚血统中分别显示出平均预测准确性提高了1.20倍(95%置信区间:[1.10; 1.3];P值 = 9.17 × 10)和1.19倍(95%置信区间:[1.11; 1.27];P值 = 1.92 × 10),PRSmix+分别将预测准确性提高了1.72倍(95%置信区间:[1.40; 2.04];P值 = 7.58 × 10)和1.42倍(95%置信区间:[1.25; 1.59];P值 = 8.01 × 10)。与先前建立的使用预定义相关性状评分的跨性状组合方法相比,我们证明我们的方法可将冠状动脉疾病的预测准确性提高至3.27倍(95%置信区间:[2.1; 4.44];FDR校正后的P值 = 2.6 × 10)。我们的方法提供了一个全面的框架,用于对PRS的组合能力进行基准测试并加以利用,以在期望的目标人群中实现最佳性能。