Neurology, UCLA, Los Angeles, California.
School of Medicine, UCSF, San Francisco, California.
J Comput Biol. 2020 Apr;27(4):599-612. doi: 10.1089/cmb.2019.0325. Epub 2020 Feb 20.
Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRSs). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications, including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed-model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work, we present a novel reference-free method to produce a PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial overfitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues and can produce informative in-sample PRSs over a single cohort without overfitting. We then demonstrate several novel applications of reference-free PRSs, including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.
大规模的队列研究结合遗传和表型数据,再加上方法学的进步,已经产生了越来越准确的预测复杂人类表型的遗传指标,称为多基因风险评分(PRS)。除了识别高危个体的潜在转化影响外,PRS 还被用于越来越多的科学应用,包括因果推断、识别多效性和遗传相关性,以及强大的基于基因和混合模型关联测试。现有的 PRS 方法依赖于外部大规模的遗传队列,这些队列也测量了感兴趣的表型。它们还需要在祖先和基因分型平台或 imputation 质量上进行匹配。在这项工作中,我们提出了一种新的无参考方法来生成 PRS,而不依赖于外部队列。我们表明,无参考 PRS 的简单实现要么导致严重的过拟合,要么导致计算时间显著增加。我们表明,我们的算法避免了这两个问题,并且可以在单个队列中产生无过拟合的信息丰富的样本内 PRS。然后,我们展示了无参考 PRS 的几个新应用,包括在 246 个代谢特征中检测多效性和高效的混合模型关联测试。