交叉验证遗传预测在多基因风险评分和线性混合模型中的高效估计及应用。

Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models.

机构信息

Neurology, UCLA, Los Angeles, California.

School of Medicine, UCSF, San Francisco, California.

出版信息

J Comput Biol. 2020 Apr;27(4):599-612. doi: 10.1089/cmb.2019.0325. Epub 2020 Feb 20.

DOI:10.1089/cmb.2019.0325

PMID:32077750

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7185352/

Abstract

Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRSs). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications, including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed-model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work, we present a novel reference-free method to produce a PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial overfitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues and can produce informative in-sample PRSs over a single cohort without overfitting. We then demonstrate several novel applications of reference-free PRSs, including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.

摘要

大规模的队列研究结合遗传和表型数据，再加上方法学的进步，已经产生了越来越准确的预测复杂人类表型的遗传指标，称为多基因风险评分（PRS）。除了识别高危个体的潜在转化影响外，PRS 还被用于越来越多的科学应用，包括因果推断、识别多效性和遗传相关性，以及强大的基于基因和混合模型关联测试。现有的 PRS 方法依赖于外部大规模的遗传队列，这些队列也测量了感兴趣的表型。它们还需要在祖先和基因分型平台或 imputation 质量上进行匹配。在这项工作中，我们提出了一种新的无参考方法来生成 PRS，而不依赖于外部队列。我们表明，无参考 PRS 的简单实现要么导致严重的过拟合，要么导致计算时间显著增加。我们表明，我们的算法避免了这两个问题，并且可以在单个队列中产生无过拟合的信息丰富的样本内 PRS。然后，我们展示了无参考 PRS 的几个新应用，包括在 246 个代谢特征中检测多效性和高效的混合模型关联测试。

相似文献

Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models.交叉验证遗传预测在多基因风险评分和线性混合模型中的高效估计及应用。

J Comput Biol. 2020 Apr;27(4):599-612. doi: 10.1089/cmb.2019.0325. Epub 2020 Feb 20.

shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores.shaPRS：利用跨性状或祖源的共享遗传效应可提高多基因评分的准确性。

Am J Hum Genet. 2024 Jun 6;111(6):1006-1017. doi: 10.1016/j.ajhg.2024.04.009. Epub 2024 May 3.

The construction of cross-population polygenic risk scores using transfer learning.使用迁移学习构建跨人群多基因风险评分。

Am J Hum Genet. 2022 Nov 3;109(11):1998-2008. doi: 10.1016/j.ajhg.2022.09.010. Epub 2022 Oct 13.

Development and Validation of a Breast Cancer Polygenic Risk Score on the Basis of Genetic Ancestry Composition.基于遗传祖源构成的乳腺癌多基因风险评分的开发和验证。

JCO Precis Oncol. 2022 Nov;6:e2200084. doi: 10.1200/PO.22.00084.

Implementation and implications for polygenic risk scores in healthcare.多基因风险评分在医疗保健中的实施及影响。

Hum Genomics. 2021 Jul 20;15(1):46. doi: 10.1186/s40246-021-00339-y.

Polygenic risk scores for the prediction of common cancers in East Asians: A population-based prospective cohort study.基于人群的前瞻性队列研究：东亚常见癌症的多基因风险评分预测。

Elife. 2023 Mar 27;12:e82608. doi: 10.7554/eLife.82608.

A principal component approach to improve association testing with polygenic risk scores.一种基于主成分分析的方法，用于提高基于多基因风险评分的关联分析。

Genet Epidemiol. 2020 Oct;44(7):676-686. doi: 10.1002/gepi.22339. Epub 2020 Jul 21.

Improving the Utility of Polygenic Risk Scores as a Biomarker for Alzheimer's Disease.提高多基因风险评分作为阿尔茨海默病生物标志物的效用。

Cells. 2021 Jun 29;10(7):1627. doi: 10.3390/cells10071627.

Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research.从直接面向消费者的遗传和表型数据中验证和自动化学习心血管代谢多基因风险评分：对精准健康研究扩展的影响。

Hum Genomics. 2022 Sep 8;16(1):37. doi: 10.1186/s40246-022-00406-y.

Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine.多基因风险评分可用于预测心血管代谢特征，这表明了祖先对于预测精准医学的重要性。

Pac Symp Biocomput. 2024;29:611-626.

引用本文的文献

Leveraging haplotype information in heritability estimation and polygenic prediction.在遗传力估计和多基因预测中利用单倍型信息。

Nat Commun. 2025 Jan 2;16(1):126. doi: 10.1038/s41467-024-55477-3.

Accurate haplotype construction and detection of selection signatures enabled by high quality pig genome sequences.高质量猪基因组序列实现的精确单倍型构建和选择信号检测。

Nat Commun. 2023 Aug 23;14(1):5126. doi: 10.1038/s41467-023-40434-3.

Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits.生物银行规模下的祖先重组图推断实现了复杂性状的系谱分析。

Nat Genet. 2023 May;55(5):768-776. doi: 10.1038/s41588-023-01379-x. Epub 2023 May 1.

Influences of rare copy-number variation on human complex traits.稀有拷贝数变异对人类复杂特征的影响。

Cell. 2022 Oct 27;185(22):4233-4248.e27. doi: 10.1016/j.cell.2022.09.028.

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data.基于高维基因组数据的疾病风险预测可解释深度迁移学习模型。

PLoS Comput Biol. 2022 Jul 15;18(7):e1010328. doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul.

Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods.增强暴露组学研究中的因果推断：遗传数据和方法的应用。

Environ Health Perspect. 2022 May;130(5):55001. doi: 10.1289/EHP9098. Epub 2022 May 9.

Placental genomics mediates genetic associations with complex health traits and disease.胎盘基因组学介导了与复杂健康特征和疾病的遗传关联。

Nat Commun. 2022 Feb 4;13(1):706. doi: 10.1038/s41467-022-28365-x.

Protein-coding repeat polymorphisms strongly shape diverse human phenotypes.蛋白质编码重复多态性强烈塑造了多样化的人类表型。

Science. 2021 Sep 24;373(6562):1499-1505. doi: 10.1126/science.abg8289. Epub 2021 Sep 23.

A model and test for coordinated polygenic epistasis in complex traits.一种复杂性状中多基因协同上位性的模型与检验。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.1922305118.

Post-GWAS knowledge gap: the how, where, and when.全基因组关联研究后的知识空白：如何、何处及何时。

NPJ Parkinsons Dis. 2020 Sep 9;6:23. doi: 10.1038/s41531-020-00125-y. eCollection 2020.

本文引用的文献

GBAT: a gene-based association test for robust detection of trans-gene regulation.GBAT：一种基于基因的关联测试，用于稳健检测转基因调控。

Genome Biol. 2020 Aug 24;21(1):211. doi: 10.1186/s13059-020-02120-1.

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations.全基因组多基因疾病风险评分可识别出与单基因突变风险相当的个体。

Nat Genet. 2018 Sep;50(9):1219-1224. doi: 10.1038/s41588-018-0183-z. Epub 2018 Aug 13.

The personal and clinical utility of polygenic risk scores.多基因风险评分的个体和临床效用。

Nat Rev Genet. 2018 Sep;19(9):581-590. doi: 10.1038/s41576-018-0018-x.

Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts.用于指导侵袭性前列腺癌筛查的多基因风险评分：在大规模队列中的开发与验证

BMJ. 2018 Jan 10;360:j5757. doi: 10.1136/bmj.j5757.

Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank.英国生物银行中跨树形结构常规医疗数据的遗传关联贝叶斯分析。

Nat Genet. 2017 Sep;49(9):1311-1318. doi: 10.1038/ng.3926. Epub 2017 Jul 31.

Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study.缺失的遗传力：差距正在缩小吗？生命线队列研究中32种复杂性状的分析。

Eur J Hum Genet. 2017 Jun;25(7):877-885. doi: 10.1038/ejhg.2017.50. Epub 2017 Apr 12.

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations.人类人口统计学历史影响不同人群的遗传风险预测。

Am J Hum Genet. 2017 Apr 6;100(4):635-649. doi: 10.1016/j.ajhg.2017.03.004. Epub 2017 Mar 30.

Polygenic Risk Score Identifies Subgroup With Higher Burden of Atherosclerosis and Greater Relative Benefit From Statin Therapy in the Primary Prevention Setting.多基因风险评分可识别出在一级预防中动脉粥样硬化负担更高且从他汀类药物治疗中获得更大相对获益的亚组。

Circulation. 2017 May 30;135(22):2091-2101. doi: 10.1161/CIRCULATIONAHA.116.024436. Epub 2017 Feb 21.

The Metabolic Syndrome in Men study: a resource for studies of metabolic and cardiovascular diseases.男性代谢综合征研究：代谢与心血管疾病研究的资源

J Lipid Res. 2017 Mar;58(3):481-493. doi: 10.1194/jlr.O072629. Epub 2017 Jan 24.

Using Genetic Distance to Infer the Accuracy of Genomic Prediction.利用遗传距离推断基因组预测的准确性。

PLoS Genet. 2016 Sep 2;12(9):e1006288. doi: 10.1371/journal.pgen.1006288. eCollection 2016 Sep.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验