• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多群体PRS的近乎免费增强:从数据裂变到伪全基因组关联研究子采样

Almost Free Enhancement of Multi-Population PRS: From Data-Fission to Pseudo-GWAS Subsampling.

作者信息

Xu Leqi, Dong Yikai, Zeng Xiaowei, Bian Zeyu, Zhou Geyu, Guan Leying, Zhao Hongyu

机构信息

Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.

Department of Statistics and Data Science, Fudan University, Shanghai, China.

出版信息

bioRxiv. 2025 Jun 20:2025.06.16.659952. doi: 10.1101/2025.06.16.659952.

DOI:10.1101/2025.06.16.659952
PMID:40611909
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12224544/
Abstract

Many multi-population polygenic risk score (PRS) methods have been proposed to improve prediction accuracy in underrepresented populations; however, no single method outperforms other methods across all data scenarios. Although integrating PRS results across multiple methods and populations may lead to more accurate predictions, this approach may be limited by the availability of individual-level tuning data to calculate combination weights. In this manuscript, we introduce MIXPRS, a robust PRS integration framework based on data fission principles, to effectively combine multiple multi-population PRS methods using only genome-wide association study (GWAS) summary statistics from multiple populations. Specifically, MIXPRS employs SNP pruning to mitigate linkage disequilibrium (LD) mismatch between the training GWAS summary statistics and LD reference panels, and utilizes non-negative least squares regression to robustly estimate PRS combination weights. Extensive simulations and real-data analyses involving 22 continuous traits and four binary traits across five populations from the UK Biobank and All of Us datasets demonstrate that MIXPRS consistently outperforms the existing methods in prediction accuracy. Because MIXPRS relies solely on GWAS summary statistics, it enjoys broad accessibility, robustness, and generalizability for underrepresented populations.

摘要

已经提出了许多多群体多基因风险评分(PRS)方法来提高在代表性不足群体中的预测准确性;然而,在所有数据场景下,没有一种方法能优于其他方法。尽管跨多种方法和群体整合PRS结果可能会带来更准确的预测,但这种方法可能会受到个体水平调整数据可用性的限制,无法计算组合权重。在本手稿中,我们介绍了MIXPRS,这是一种基于数据裂变原理的稳健的PRS整合框架,仅使用来自多个群体的全基因组关联研究(GWAS)汇总统计数据,就能有效地结合多种多群体PRS方法。具体而言,MIXPRS采用单核苷酸多态性(SNP)剪枝来减轻训练GWAS汇总统计数据与连锁不平衡(LD)参考面板之间的连锁不平衡(LD)不匹配,并利用非负最小二乘回归来稳健地估计PRS组合权重。涉及英国生物银行和“我们所有人”数据集的五个群体中的22个连续性状和四个二元性状的广泛模拟和实际数据分析表明,MIXPRS在预测准确性方面始终优于现有方法。由于MIXPRS仅依赖于GWAS汇总统计数据,因此它对代表性不足的群体具有广泛的可及性、稳健性和通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/91f2a42836f1/nihpp-2025.06.16.659952v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/eed508aec80b/nihpp-2025.06.16.659952v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/41cd306682cd/nihpp-2025.06.16.659952v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/8c328e37b105/nihpp-2025.06.16.659952v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/3465f7796dba/nihpp-2025.06.16.659952v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/9d930a92acb9/nihpp-2025.06.16.659952v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/91f2a42836f1/nihpp-2025.06.16.659952v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/eed508aec80b/nihpp-2025.06.16.659952v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/41cd306682cd/nihpp-2025.06.16.659952v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/8c328e37b105/nihpp-2025.06.16.659952v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/3465f7796dba/nihpp-2025.06.16.659952v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/9d930a92acb9/nihpp-2025.06.16.659952v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1e7/12224544/91f2a42836f1/nihpp-2025.06.16.659952v1-f0006.jpg

相似文献

1
Almost Free Enhancement of Multi-Population PRS: From Data-Fission to Pseudo-GWAS Subsampling.多群体PRS的近乎免费增强:从数据裂变到伪全基因组关联研究子采样
bioRxiv. 2025 Jun 20:2025.06.16.659952. doi: 10.1101/2025.06.16.659952.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
All of Us diversity and scale improve polygenic prediction contextually with greatest improvements for under-represented populations.“我们所有人”项目的多样性和规模在情境中改善了多基因预测,对代表性不足的人群改善最大。
bioRxiv. 2024 Aug 6:2024.08.06.606846. doi: 10.1101/2024.08.06.606846.
4
Evaluating polygenic risk score prediction performance for Alzheimer's disease in a population-based Hispanic cohort using single- and multi-ancestry models.在一个基于人群的西班牙裔队列中,使用单祖先和多祖先模型评估阿尔茨海默病的多基因风险评分预测性能。
Lancet Reg Health Am. 2025 Jul 25;49:101198. doi: 10.1016/j.lana.2025.101198. eCollection 2025 Sep.
5
Evaluating genomic polygenic risk scores for childhood acute lymphoblastic leukemia in Latinos.评估拉丁裔儿童急性淋巴细胞白血病的基因组多基因风险评分。
HGG Adv. 2023 Oct 12;4(4):100239. doi: 10.1016/j.xhgg.2023.100239. Epub 2023 Sep 14.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
An ensemble penalized regression method for multi-ancestry polygenic risk prediction.一种用于多祖裔多基因风险预测的集成惩罚回归方法。
Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.
8
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
9
Leveraging global genetics resources to enhance polygenic prediction across ancestrally diverse populations.利用全球遗传资源加强不同祖先群体间的多基因预测。
HGG Adv. 2025 Jul 18;6(4):100482. doi: 10.1016/j.xhgg.2025.100482.
10
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

1
Discussion of "Data fission: splitting a single data point".《数据裂变:拆分单个数据点》的讨论
J Am Stat Assoc. 2025;120(549):151-157. doi: 10.1080/01621459.2024.2421998. Epub 2025 Apr 14.
2
Comment on "Data Fission: Splitting a Single Data Point", Data Fission for Unsupervised Learning: A Discussion on Post-Clustering Inference and the Challenges of Debiasing.关于《数据裂变:拆分单个数据点》的评论,无监督学习中的数据裂变:关于聚类后推理及去偏挑战的讨论
J Am Stat Assoc. 2025;120(549):174-175. doi: 10.1080/01621459.2024.2412191. Epub 2025 Apr 14.
3
Comments on "Data fission: splitting a single data point" by James Leiner, Boyan Duan, Larry Wasserman, and Aaditya Ramdas.
对詹姆斯·莱纳、段博岩、拉里·瓦瑟曼和阿迪蒂亚·拉姆达斯所著的《数据裂变:拆分单个数据点》的评论
J Am Stat Assoc. 2025;120(549):176-177. doi: 10.1080/01621459.2024.2412808. Epub 2025 Apr 14.
4
JointPRS: A data-adaptive framework for multi-population genetic risk prediction incorporating genetic correlation.JointPRS:一种用于多群体遗传风险预测的数据自适应框架,纳入了遗传相关性。
Nat Commun. 2025 Apr 24;16(1):3841. doi: 10.1038/s41467-025-59243-x.
5
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics.利用 GWAS 汇总统计数据优化和基准化多基因风险评分。
Genome Biol. 2024 Oct 8;25(1):260. doi: 10.1186/s13059-024-03400-w.
6
Genome-wide association analyses of breast cancer in women of African ancestry identify new susceptibility loci and improve risk prediction.全基因组关联分析鉴定非洲裔女性乳腺癌的新易感位点并改善风险预测。
Nat Genet. 2024 May;56(5):819-826. doi: 10.1038/s41588-024-01736-4. Epub 2024 May 13.
7
An ensemble penalized regression method for multi-ancestry polygenic risk prediction.一种用于多祖裔多基因风险预测的集成惩罚回归方法。
Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.
8
MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups.基于多祖先群体信息的贝类增强贝叶斯多基因风险预测
Cell Genom. 2024 Apr 10;4(4):100539. doi: 10.1016/j.xgen.2024.100539.
9
The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities.“我们所有人”队列中致病变异的频率揭示了祖先驱动的差异。
Commun Biol. 2024 Feb 19;7(1):174. doi: 10.1038/s42003-023-05708-y.
10
Tuning parameters for polygenic risk score methods using GWAS summary statistics from training data.使用来自训练数据的 GWAS 汇总统计信息调整多基因风险评分方法的参数。
Nat Commun. 2024 Jan 2;15(1):24. doi: 10.1038/s41467-023-44009-0.