Department of Biostatistics and Data Science, University of Texas Health Science Center, Houston, USA.
Department of Psychology, Florida International University, Miami, USA.
BMC Bioinformatics. 2022 Jan 10;23(1):28. doi: 10.1186/s12859-022-04566-5.
BACKGROUND/AIM: The polygenic risk score (PRS) shows promise as a potentially effective approach to summarize genetic risk for complex diseases such as alcohol use disorder that is influenced by a combination of multiple variants, each of which has a very small effect. Yet, conventional PRS methods tend to over-adjust confounding factors in the discovery sample and thus have low power to predict the phenotype in the target sample. This study aims to address this important methodological issue.
This study proposed a new method to construct PRS by (1) approximating the polygenic model using a few principal components selected based on eigen-correlation in the discovery data; and (2) conducting principal component projection on the target data. Secondary data analysis was conducted on two large scale databases: the Study of Addiction: Genetics and Environment (SAGE; discovery data) and the National Longitudinal Study of Adolescent to Adult Health (Add Health; target data) to compare performance of the conventional and proposed methods.
The results show that the proposed method has higher prediction power and can handle participants from different ancestry backgrounds. We also provide practical recommendations for setting the linkage disequilibrium (LD) and p value thresholds.
背景/目的:多基因风险评分(PRS)有望成为一种有效的方法,用于总结受多种变异影响的复杂疾病(如酒精使用障碍)的遗传风险,这些变异的每个变异都具有非常小的影响。然而,传统的 PRS 方法往往会过度调整发现样本中的混杂因素,因此在目标样本中预测表型的能力较低。本研究旨在解决这一重要的方法学问题。
本研究提出了一种新的 PRS 构建方法,通过(1)基于发现数据中的特征相关关系选择几个主成分来近似多基因模型;(2)在目标数据上进行主成分投影。对两个大型数据库进行了二次数据分析:成瘾研究:遗传学和环境(SAGE;发现数据)和青少年至成年健康纵向研究(Add Health;目标数据),以比较传统方法和提出的方法的性能。
结果表明,该方法具有更高的预测能力,并且可以处理来自不同祖先背景的参与者。我们还为设置连锁不平衡(LD)和 p 值阈值提供了实用建议。