• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

充分利用聚类和阈值处理多基因评分。

Making the Most of Clumping and Thresholding for Polygenic Scores.

机构信息

Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France; Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.

Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.

出版信息

Am J Hum Genet. 2019 Dec 5;105(6):1213-1221. doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21.

DOI:10.1016/j.ajhg.2019.11.001
PMID:31761295
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6904799/
Abstract

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

摘要

多基因预测有可能为精准医学做出贡献。聚类和阈值(C+T)是一种广泛用于衍生多基因评分的方法。在使用 C+T 时,会测试多个 p 值阈值以最大化衍生多基因评分的预测能力。除了这个 p 值阈值,我们还提出调整 C+T 的另外三个超参数。我们实现了一种有效的方法,可以根据四个超参数的网格来衍生数千种不同的 C+T 评分。例如,使用 16 个物理核心,为 30 万个体和 100 万个变体推导 123K 种不同的 C+T 评分只需要几个小时。我们发现,与仅调整 p 值阈值相比,对这四个超参数进行优化可以提高 C+T 在模拟和真实数据应用中的预测性能。当预测抑郁状态时,这种改进尤其明显,从仅调整 p 值阈值时的 AUC 为 0.557(95%CI:[0.544-0.569])提高到调整我们提出的所有四个超参数时的 AUC 为 0.592(95%CI:[0.580-0.604])。我们进一步提出了堆叠聚类和阈值(SCT),这是一种源自所有衍生 C+T 评分的多基因评分。SCT 不是选择一组在某些训练集中最大化预测的超参数,而是通过使用有效的惩罚回归来学习所有 C+T 评分的最佳线性组合。我们将 SCT 应用于 UK Biobank 数据中的八种不同的病例对照疾病,并发现 SCT 可以显著提高预测准确性,平均 AUC 增加 0.035 以上。

相似文献

1
Making the Most of Clumping and Thresholding for Polygenic Scores.充分利用聚类和阈值处理多基因评分。
Am J Hum Genet. 2019 Dec 5;105(6):1213-1221. doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21.
2
Efficient Implementation of Penalized Regression for Genetic Risk Prediction.高效实现基于惩罚回归的遗传风险预测。
Genetics. 2019 May;212(1):65-74. doi: 10.1534/genetics.119.302019. Epub 2019 Feb 26.
3
Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction.利用个体水平的遗传数据和 GWAS 汇总统计数据可以提高多基因预测。
Am J Hum Genet. 2021 Jun 3;108(6):1001-1011. doi: 10.1016/j.ajhg.2021.04.014. Epub 2021 May 7.
4
Evaluation of polygenic prediction methodology within a reference-standardized framework.在参考标准化框架内评估多基因预测方法。
PLoS Genet. 2021 May 4;17(5):e1009021. doi: 10.1371/journal.pgen.1009021. eCollection 2021 May.
5
Development of a Polygenic Risk Score for Metabolic Dysfunction-Associated Steatotic Liver Disease Prediction in UK Biobank.用于在英国生物银行中预测代谢功能障碍相关脂肪性肝病的多基因风险评分的开发
Genes (Basel). 2024 Dec 28;16(1):33. doi: 10.3390/genes16010033.
6
A principal component approach to improve association testing with polygenic risk scores.一种基于主成分分析的方法,用于提高基于多基因风险评分的关联分析。
Genet Epidemiol. 2020 Oct;44(7):676-686. doi: 10.1002/gepi.22339. Epub 2020 Jul 21.
7
Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。
Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.
8
Significant sparse polygenic risk scores across 813 traits in UK Biobank.在英国生物库中,813 项特征存在显著稀疏多基因风险评分。
PLoS Genet. 2022 Mar 24;18(3):e1010105. doi: 10.1371/journal.pgen.1010105. eCollection 2022 Mar.
9
Integrating genome-wide polygenic risk scores and non-genetic risk to predict colorectal cancer diagnosis using UK Biobank data: population based cohort study.利用英国生物库数据整合全基因组多基因风险评分和非遗传风险来预测结直肠癌诊断:基于人群的队列研究。
BMJ. 2022 Nov 9;379:e071707. doi: 10.1136/bmj-2022-071707.
10
Benchmarking multi-ancestry prostate cancer polygenic risk scores in a real-world cohort.在一个真实世界队列中对多血统前列腺癌多基因风险评分进行基准测试。
PLoS Comput Biol. 2024 Apr 10;20(4):e1011990. doi: 10.1371/journal.pcbi.1011990. eCollection 2024 Apr.

引用本文的文献

1
Leveraging multimodal neuroimaging and GWAS for identifying modality-level causal pathways to Alzheimer's disease.利用多模态神经影像学和全基因组关联研究来识别阿尔茨海默病的模态水平因果通路。
Imaging Neurosci (Camb). 2025 May 16;3. doi: 10.1162/imag_a_00580. eCollection 2025.
2
Differential performance of polygenic risk scores for heart disease in Hispanic/Latino subgroups: Findings of the Hispanic Community Health Study/Study of Latinos.西班牙裔/拉丁裔亚组中心脏病多基因风险评分的差异表现:西班牙裔社区健康研究/拉丁裔研究的结果
HGG Adv. 2025 Jul 28;6(4):100486. doi: 10.1016/j.xhgg.2025.100486.
3
Clinical validation of an integrated risk assessment test incorporating genomic and non-genomic data for sporadic breast cancer in Colombia.哥伦比亚一项整合基因组和非基因组数据的散发性乳腺癌综合风险评估测试的临床验证。
Front Genet. 2025 Jul 2;16:1556907. doi: 10.3389/fgene.2025.1556907. eCollection 2025.
4
Polygenic risk scores for prostate cancer: Comparative evaluations in UK and Australian cohorts.前列腺癌的多基因风险评分:英国和澳大利亚队列的比较评估。
HGG Adv. 2025 Jul 7;6(4):100477. doi: 10.1016/j.xhgg.2025.100477.
5
Polygenic Risk Prediction for Normal-Tension Glaucoma.正常眼压性青光眼的多基因风险预测
Invest Ophthalmol Vis Sci. 2025 Jul 1;66(9):4. doi: 10.1167/iovs.66.9.4.
6
Investigating the Sexual Dimorphism of Waist-to-Hip Ratio and Its Associations with Complex Traits.探究腰臀比的性别差异及其与复杂性状的关联。
Genes (Basel). 2025 Jun 16;16(6):711. doi: 10.3390/genes16060711.
7
Exploring Effects of Age at the Onset of Myopia on Multiple Diseases Using Electronic Health Records.利用电子健康记录探索近视发病年龄对多种疾病的影响。
Ophthalmol Sci. 2025 May 5;5(5):100819. doi: 10.1016/j.xops.2025.100819. eCollection 2025 Sep-Oct.
8
Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.使用快速且内存高效的算法进行多基因评分的全基因组推断。
Am J Hum Genet. 2025 May 20. doi: 10.1016/j.ajhg.2025.05.002.
9
Optimization of multi-ancestry polygenic risk score disease prediction models.多血统多基因风险评分疾病预测模型的优化
Sci Rep. 2025 May 20;15(1):17495. doi: 10.1038/s41598-025-02903-1.
10
Integration of transcriptome-wide association study and gene-based association analysis identifies candidate genes for Hodgkin lymphoma.全转录组关联研究与基于基因的关联分析相结合,鉴定出霍奇金淋巴瘤的候选基因。
J Cancer Res Clin Oncol. 2025 May 20;151(5):171. doi: 10.1007/s00432-025-06224-8.

本文引用的文献

1
Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets.纳入功能先验信息可提高 UK Biobank 和 23andMe 数据集的多基因预测准确性。
Nat Commun. 2021 Oct 18;12(1):6052. doi: 10.1038/s41467-021-25171-9.
2
Tutorial: a guide to performing polygenic risk score analyses.教程:多基因风险评分分析操作指南。
Nat Protoc. 2020 Sep;15(9):2759-2772. doi: 10.1038/s41596-020-0353-1. Epub 2020 Jul 24.
3
Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics.基于分区 GWAS 汇总统计量的非参数多基因风险预测。
Am J Hum Genet. 2020 Jul 2;107(1):46-59. doi: 10.1016/j.ajhg.2020.05.004. Epub 2020 May 28.
4
Improved polygenic prediction by Bayesian multiple regression on summary statistics.基于汇总统计数据的贝叶斯多元回归提高多基因预测能力。
Nat Commun. 2019 Nov 8;10(1):5086. doi: 10.1038/s41467-019-12653-0.
5
Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record.在利用电子健康记录的基因组研究中,Cox回归增强了检测基因型与表型关联的效能。
BMC Genomics. 2019 Nov 4;20(1):805. doi: 10.1186/s12864-019-6192-1.
6
PRSice-2: Polygenic Risk Score software for biobank-scale data.PRSice-2:用于生物库规模数据的多基因风险评分软件。
Gigascience. 2019 Jul 1;8(7). doi: 10.1093/gigascience/giz082.
7
Polygenic prediction via Bayesian regression and continuous shrinkage priors.基于贝叶斯回归和连续收缩先验的多基因预测。
Nat Commun. 2019 Apr 16;10(1):1776. doi: 10.1038/s41467-019-09718-5.
8
Genomic prediction of cognitive traits in childhood and adolescence.儿童和青少年认知特征的基因组预测。
Mol Psychiatry. 2019 Jun;24(6):819-827. doi: 10.1038/s41380-019-0394-4. Epub 2019 Apr 11.
9
Efficient Implementation of Penalized Regression for Genetic Risk Prediction.高效实现基于惩罚回归的遗传风险预测。
Genetics. 2019 May;212(1):65-74. doi: 10.1534/genetics.119.302019. Epub 2019 Feb 26.
10
Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes.利用次要表型,有效的跨性状惩罚回归提高了大队列中的预测准确性。
Nat Commun. 2019 Feb 4;10(1):569. doi: 10.1038/s41467-019-08535-0.